Marion Sudvarg Chris Gill Brian Kocoloski Son Dinh CSE 522S Advanced Operating Systems Washington University in St Louis St Louis MO 63130 1 So far we have discussed These provide mechanisms to build and monitor containers ID: 931215
Download Presentation The PPT/PDF document "Control Groups and Subsystem Observabili..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Control Groups and Subsystem Observability
Marion Sudvarg, Chris Gill, Brian Kocoloski, Son Dinh CSE 522S – Advanced Operating SystemsWashington University in St. LouisSt. Louis, MO 63130
1
Slide2So far, we have discussed …
These provide mechanisms to build and monitor containers
But what about
control
?How do we control and limit use of system resources?Today:Resource limitsIntroduction to control groups (cgroups)Observing subsystem-wide resource use
2
CSE 522S – Advanced Operating Systems
ObservabilityHardware performance monitoringFile event monitoring (inotify)proc filesystemtop, htop, ps
Isolation
chroot
Namespaces
Capabilities
Slide3Resource Limits
Processes on a system must share limited
resources
CPU bandwidth, memory, disk/storage space, etc.
View own resource usage with getrusage()A Resource Limit
constrains a process’s usage of a specific resourcesetrlimit(resource, const *{soft limit, hard limit})
getrlimit(resource, *{soft limit, hard limit})
Soft Limit: the effective (current) limitHard Limit: an enforced ceiling on the soft limitRequires CAP_SYS_RESOURCE to adjustA process can only set its own rlimitUseful to constrain a process after exec()Not useful for constraining other existing processes3CSE 522S – Advanced Operating Systems
Slide4Inspecting Resource Limits
RLIMIT_CPU
RLIMIT_FSIZE
RLIMIT_DATA
RLIMIT_STACK
RLIMIT_CORERLIMIT_RSS
RLIMIT_NPROC
RLIMIT_NOFILERLIMIT_MEMLOCKRLIMIT_ASRLIMIT_LOCKSRLIMIT_SIGPENDINGRLIMIT_MSGQUEUERLIMIT_NICERLIMIT_RTPRIORLIMIT_RTTIME4CSE 522S – Advanced Operating Systems
Slide5Resource Limit Example:
Constraining Memory Usage
What if we want to constrain a process to 128MB of RAM?
const
rlim_t
lim = 0x8000000;
setrlimit(RLIMIT_RSS, const *{
lim, lim});const char * path = “./myprogram”;execl(path, path);Now, ./myprogram cannot use more than 128MB of main memoryThis constraints it from interfering with the systemOr does it?5CSE 522S – Advanced Operating SystemsRLIMIT_RSS:Resident set size, i.e. size of main memory (not in swap)
Slide6Counterexample: Anomalous Process Behavior
A classical adversarial attack is the “fork bomb”A planted process repeatedly forks children (who recursively repeatedly fork children, etc.)
Eventually uses up system resources, resulting in a denial of service for other users (possibly freezing the system)
However, anomalous isn’t always adversarial
A process forks children to retry failed transactions or to run redundant transactions for fault-toleranceMisconfiguration or unanticipated environmental conditions may lead to a feedback loop resulting in a process cascade“Never attribute to malice what can be adequately explained by someone else’s feature”Terry Tidwell, PhD 2011, Washington University
CSE 522S – Advanced Operating Systems
6
Slide7Limitations of Resource Limits
If a process forks,
every child
has the same resource constraints as the parent
Allows resource limits to be circumvented!A process with n children now uses (n+1)*rlimit units of the resourceWhat about
RLIMIT_NPROC?Limits the number of processes for a user
RLIMIT_NPROC*RLIMIT_RSS: hard limit on a user’s RAM usage
But, this lacks flexibilityWhat if a user wants to run a single memory-intensive process?What if a user wants to run more than RLIMIT_NPROC processes, but each uses very little memory?How can we constrain resource usage in a flexible, principled fashion?Answer: Control Groups7CSE 522S – Advanced Operating Systems
Slide8Linux Control Groups
Constrain resource usage for hierarchical
groups
of processes using various
controllers:Memory (today’s focus)CPUsI/ONetworkNumber of processesPerformance monitoring eventsA forked child process is added to its parent’s cgroup
The hierarchy is reflected as a pseudo-filesystem“Everything is a file”Limits set through reading/writing file interfaces
Enables file event multiplexing techniques for monitoring
8CSE 522S – Advanced Operating Systemscgroups pseudo-filesystem/containersc1c2
c3
cg1
/sys/fs/cgroup/unified
Implicitly contains all system processes not in a child cgroup
Slide9cgroups
v1 vs v2
cgroups come in two flavors:
v1
and v2Going forward, we will talk about v2 unless specified otherwisev2 was released to reduce complexity compared to v1
v2 provides a unified hierarchy for all cgroups controllersv1 allowed separate pseudo-filesystems to be mounted for different controllers
v2 allows different controllers to be active at different levels of the hierarchy
v2 requires that only “leaf node” cgroups have processes (besides the root cgroup)v1 allowed different threads under the same process to be in different cgroups – v2 restricts this behavior in a way that preserves process/thread hierarchyv2 allows improved notification for empty cgroups with no processesv1 allowed each cgroup to specify a program to launch when it emptiesv2 uses a status file instead – allows a single program to multiplex across cgroupsv1 is still available for backward compatibilityOriginally, v2 did not implement all v1 controllersNow supports most of the same features9CSE 522S – Advanced Operating Systems
Slide10Memory Control Groups
Enable
observation
and
control over memory use of a group of processesIncludes kernel memory for the processes, e.g. dentries and inodesMemory limits enforced by Out Of Memory (OOM) killer kernel threadRemember, only leaf nodes have processes: interfaces in a non-leaf cgroup control all descendent cgroups in hierarchy
Interfaces include:
10
CSE 522S – Advanced Operating SystemsInterface FileDescriptionmemory.currentShows current memory usagememory.statA read-only key-value file with details of memory allocationmemory.highA value that defines a threshold above which processes are throttled and memory is aggressively reclaimedmemory.maxAbove this threshold, OOM killer is invoked on processes for which memory can’t be reclaimed, terminating them if necessarymemory.eventsA read-only file with key-value pairs tracking events triggered by limits
Complete list at: https://docs.kernel.org/admin-guide/cgroup-v2.html#memory
Slide11Cgroups Delegation
11
CSE 522S – Advanced Operating Systems
A subtree of the cgroups hierarchy can be delegated to a nonprivileged user
Here, uid 1000 has control over
user1000_containersThey can’t modify the user1000_containers interfaces – subtree resource usage remains constrainedThey can create and manage child cgroups
They can move processes within the subtreecgroups namespaces can define delegation boundaries
DELEGATIONcgroups pseudo-filesystem/user1000_containersc1c2
c3
cg1
Slide12Cgroups Namespaces
Constrain processes to viewing a portion of the namespace hierarchy:
/sys/fs/cgroup/unified/containers/c1
/c1
Highly useful in container environments!Prevents information leaksEasier container migrationPrevent container processes from accessing ancestor cgroup directories
12
CSE 522S – Advanced Operating Systems
Slide13Reading Assignments
LSP
ch.
9
: Covers userspace interaction with Linux kernel memory management mechanismsLSP pp. 204-209: Covers resource limitsWe provide a condensed PDF focusing on relevant sections of the man 7 cgroups man pageman 7 cgroups_namespaces: Coverage of the cgroups namespace type
The Memory Controller section of the Control Group v2 pages in the Linux Kernel Documentation (stop reading at the IO header)
(Optional) DKR pp. 256-260: A brief overview of Docker’s interaction with cgroups
13CSE 522S – Advanced Operating Systems
Slide14Today’s Studio
Configure your Raspberry Pi to mount the
cgroups v2
hierarchy
Experiment with using the memory controller to set and detect limits and eventsFirst to enforce a hard limit on memory usageThen to enforce a soft limit, using inotify to monitor for events triggered when this threshold is passedIntegrate cgroups into your simple container environment
Observe and control your container’s memory usageUse cgroup namespaces to isolate your container’s view of the cgroups hierarchy
14
CSE 522S – Advanced Operating Systems