linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Vlastimil Babka <vbabka@suse.cz>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>
Subject: Re: [PATCH] mm/mprotect: allow unfaulted VMAs to be unaccounted on mprotect()
Date: Tue, 27 Jun 2023 08:59:33 +0200	[thread overview]
Message-ID: <1832a772-93b4-70ad-3a2c-d8da104ce8dd@redhat.com> (raw)
In-Reply-To: <074fc253-beb4-f7be-14a1-ee5f4745c15b@suse.cz>

Hi all,

On 27.06.23 08:28, Vlastimil Babka wrote:
> On 6/26/23 22:46, Lorenzo Stoakes wrote:
>> When mprotect() is used to make unwritable VMAs writable, they have the
>> VM_ACCOUNT flag applied and memory accounted accordingly.
>>
>> If the VMA has had no pages faulted in and is then made unwritable once
>> again, it will remain accounted for, despite not being capable of extending
>> memory usage.
>>
>> Consider:-
>>
>> ptr = mmap(NULL, page_size * 3, PROT_READ, MAP_ANON | MAP_PRIVATE, -1, 0);
>> mprotect(ptr + page_size, page_size, PROT_READ | PROT_WRITE);
>> mprotect(ptr + page_size, page_size, PROT_READ);
> 
> In the original Mike's example there were actual pages populated, in that
> case we still won't merge the vma's, right? Guess that can't be helped.
>

I am clearly missing the motivation for this patch: above is a 
artificial reproducer that I cannot really imagine being relevant in 
practice.

So is there any sane workload that does random mprotect() without even 
touching memory once? Sure, fuzzing, ... artificial reproducers ... but 
is there any real-life problem we're solving here?

IOW, why did you (Lorenzo) invest time optimizing for this andcrafting 
this patch and why should reviewer invest time to understand if it's 
correct? :)


>> The first mprotect() splits the range into 3 VMAs and the second fails to
>> merge the three as the middle VMA has VM_ACCOUNT set and the others do not,
>> rendering them unmergeable.
>>
>> This is unnecessary, since no pages have actually been allocated and the
>> middle VMA is not capable of utilising more memory, thereby introducing
>> unnecessary VMA fragmentation (and accounting for more memory than is
>> necessary).
>>
>> Since we cannot efficiently determine which pages map to an anonymous VMA,
>> we have to be very conservative - determining whether any pages at all have
>> been faulted in, by checking whether vma->anon_vma is NULL.
>>
>> We can see that the lack of anon_vma implies that no anonymous pages are
>> present as evidenced by vma_needs_copy() utilising this on fork to
>> determine whether page tables need to be copied.
>>
>> The only place where anon_vma is set NULL explicitly is on fork with
>> VM_WIPEONFORK set, however since this flag is intended to cause the child
>> process to not CoW on a given memory range, it is right to interpret this
>> as indicating the VMA has no faulted-in anonymous memory mapped.
>>
>> If the VMA was forked without VM_WIPEONFORK set, then anon_vma_fork() will
>> have ensured that a new anon_vma is assigned (and correctly related to its
>> parent anon_vma) should any pages be CoW-mapped.
>>
>> The overall operation is safe against races as we hold a write lock against
>> mm->mmap_lock.
>>
>> If we could efficiently look up the VMA's faulted-in pages then we would
>> unaccount all those pages not yet faulted in. However as the original
>> comment alludes this simply isn't currently possible, so we remain
>> conservative and account all pages or none at all.
>>
>> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> 
> So in practice programs will likely do the PROT_WRITE in order to actually
> populate the area, so this won't trigger as I commented above. But it can
> still help in some cases and is cheap to do, so:

IMHO we should much rather look into getting hugetlb ranges merged. Mt 
recollection is that we'll never end up merging hugetlb VMAs once split.

This patch adds code without a clear motivation. Maybe there is a good 
motivation?

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2023-06-27  6:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26 20:46 Lorenzo Stoakes
2023-06-27  6:28 ` Vlastimil Babka
2023-06-27  6:59   ` David Hildenbrand [this message]
2023-06-27  8:49     ` Lorenzo Stoakes
2023-06-27  9:13       ` David Hildenbrand
2023-06-27  9:47         ` Lorenzo Stoakes
2023-06-27 10:18           ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1832a772-93b4-70ad-3a2c-d8da104ce8dd@redhat.com \
    --to=david@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=rppt@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox