Security Containers: Control Groups

Lessandro Z. Ugulino
5 min readOct 11, 2023

Today, I’d like to talk about one of the fundamental building blocks that are used to make containers: control groups, as frequently known as cgroups.

Basically, cgroups limit the resources, such as memory, CPU, and network input/output, that a group of processes can use.

In terms of security, cgroups can ensure that one process can’t affect the behaviour of the other process by hogging all the resources, for example, using all the CPU or memory to starve other applications.

cgroup “How much you can use”
namespace “How much you can see”

Let’s see how cgroups are organized.

Cgroup Hierarchies

Cgroup controller manages the hierarchy for each type of resource. Any Linux process is a member of one cgroup of each type.

The Linux kernel communicates information about cgroups through a set of pseudo-filesystems that usually is located at /sys/fs/cgroup.

In terms of management, it’ll involve reading and writing to the files and directories within these hierarchies.

The below image describes the memory cgroup.

Some of these files are written by the Kernel and others can be modified. There’s no specific way to tell which are parameters and which are informational without consulting the documentation. The purpose of some of these files’ names is intuitive, for example, memory.limit_in_bytes holds a writable value that sets the amount of memory available to processes in the group; memory.max_usage_in_bytes reports the max memory usage within the group.

If you want to limit memory usage for a process, you’ll need to create a new cgroup and then assign the process to it.

Creating Cgroups

When you create a subdirectory inside this memory directory, you’re creating a cgroup, and the kernel will automatically populate the directory with the various files that represent parameters and statistics about the cgroup:

As you can see, some of these files hold parameters that’ll define the limits (example: memory.limit_in_bytes) and others communicate statistics (example: memory.usage_in_bytes) about the current use of resources in the control group.

The container runtime will create new cgroups when you start a container. You can use the lscgroup command to list all cgroups.

Let’s see the difference in memory when you start a container.

Take a snapshot of the memory cgroups:

lscgroup memory:/ > before.memory

Start a container:

docker run --name nginx -p 80:80 -d nginx

Take another snapshot of the memory cgroups and compare both:

lscgroup memory:/ > after.memory

diff before.memory after.memory

While the container is still running, we can inspect the cgroup from the host:

ls docker/d46b4e91ea4f13aa86134306e364fb5906f184ade224911ebd52e4ff7f2fbd61

The list inside the container is available from the /proc directory:

Once you have a cgroup, you can modify parameters within it by writing to the appropriate files.

Setting Resource Limits

The file memory.limit_in_bytes will show how much memory is available to the cgroup.

By default the memory isn’t limited, this number represents all the memory available to the virtual machine I’m using to run this container.

As there is no limit for this parameter, a process is allowed to consume unlimited memory, or it could be compromised by a resource exhaustion attack that takes advantage of a memory leak to deliberately use as much memory as possible. You can reduce this kind of attack and ensure that other processes can carry on as normal by setting limits on the memory and other resources.

You can restrict the memory by running the below command or for Kubernetes.

docker run -m 512m -it --rm -d -p 8080:80 --name web nginx

-m: memory

Now you’ll find that the memory.limit_in_bytes parameter is approximately what you configured as the limit.

Assigning a Process to a Cgroup

Similar to setting resource limits, assigning a process to a cgroup is a simple matter of writing its process ID into the cgroup.procs file for the cgroup.

The below command will write the process ID (1430, it’s the process ID of a shell)

The shell is now a member of a cgroup, with its memory limited to a little under 100kB. When I run ls command the process gets killed when it attempts to exceed the memory limit.

Cgroups V2

Since 2016 there has been version 2 of cgroups. The biggest difference is that in cgroups v2 a process can’t join different groups for different controllers. In v1 a process could join /sys/fs/cgroup/memory/mygroup and /sys/fs/cgroup/pids/yourgroup. In v2 the process joins /sys/fs/cgroup/ourgroup and is subject to all the controllers for ourgroup.

Akihiro Sudo summarized the new version here.

--

--