Dynamically reserving swap space for MAP_NORESERVE mappings

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: "linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Dynamically reserving swap space for MAP_NORESERVE mappings
Date: Fri, 12 Feb 2021 14:00:00 +0100	[thread overview]
Message-ID: <989ec2d2-efe9-6608-b132-3167878aacb3@redhat.com> (raw)

Hi,

I'm planning on making use of MAP_NORESERVE for sparse memory regions, 
but I still want to have some way to reduce the chance of running into 
random OOMs, similar to the ones we have with !MAP_NORESERVE on private 
mappings. I want dynamic reservations of swap space.

The rough idea is having a large mmap(MAP_NORESERVE) area in which I 
dynamically populate/discard memory to control the memory consumption, 
similar to a memory allocator - but rather in the context of dynamically 
resizing VMs. In case the user requests a dangerous configurations ("add 
50GB" instead of "add 5GB"), I rather want to fail in a nice way early 
and disallow growing a VM instead of crashing the VM later on.

For anything file-backed (MAP_SHARED) this is fairly easy: fallocate() 
can preallocate memory. If it fails, there is not sufficient backing 
storage. (it might be nice to also only reserve and not preallocate for 
hugetlbfs, but that's another story)

For anonymous memory / MAP_PRIVATE it's complicated. I want to avoid any 
kinds of remapping (mmap(MAP_FIXED | !MAP_NORESERVE)) within the sparse 
region, as it is expensive, I can easily run into too mapping limits, 
and it creates quite some problems with other parallel features that are 
enabled (e.g., userfaultfd).

So I actually want to decide myself how much memory is reserved, have a 
way to increase it (and fail if impossible) or decrease it. Doing this 
per VMA is not possible, as it's unclear what to do on VMA 
splits/unmappings.

One idea is concurrently resizing a parallel, pre-reserved 
mmap(MAP_PRIVATE|MAP_ANON) area, which would fail when trying to grow it 
via mmap(MAP_FIXED) and there is not sufficient swap. This fells kind of 
wrong to achieve the goal and it might fail due to per-process limits.

My naive approach would be having a syscall that allows for 
increasing/decreasing an additional per-process reservation like:

if (!delta)
	return 0;
if (mmap_write_lock_killable(mm))
	return -EINTR;
if (delta > 0) {
	if (security_vm_enough_memory_mm(mm, delta)) {
		mmap_write_unlock(mm);
		return -ENOMEM;
	}
} else {
	if (-delta >= mm->extra_nr_accounted) {
		mmap_write_unlock(mm);
		return -EINVAL;
	}
	vm_unacct_memory(-delta);
}
mm->extra_nr_accounted += delta;
mmap_write_unlock(mm);
return 0;

Or setting an explicit reservation instead / being able to observe the 
current reservation.

We could limit it to the actual size of all VMAs that are not accounted 
due to MAP_NORESERVE, so we would implicitly check for may_expand_vm(), 
as that has been checked when the mmap(MAP_NORESERVE) was created. Of 
course, we would have to update when unmapping applicable MAP_NORESERVE 
areas (will have to think about temporary remappings in user space). Not 
sure if that is required, but it feels like there should be an upper 
limit besides the one in security_vm_enough_memory_mm()

Which other limits do we have that we would have to consider?

Alternatives? Thoughts? Am I missing something important?

Thanks!

-- 
Thanks,

David / dhildenb

                 reply	other threads:[~2021-02-12 13:00 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=989ec2d2-efe9-6608-b132-3167878aacb3@redhat.com \
    --to=david@redhat.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox