From: David Hildenbrand <david@redhat.com>
To: Yang Shi <shy828301@gmail.com>
Cc: Peter Xu <peterx@redhat.com>,
Linux Memory Management List <linux-mm@kvack.org>,
Minchan Kim <minchan@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
Rik van Riel <riel@surriel.com>, Michal Hocko <mhocko@kernel.org>,
Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: Page zapping and page table reclaim
Date: Mon, 22 Mar 2021 10:34:27 +0100 [thread overview]
Message-ID: <53b83aae-01d7-156d-d1bb-836002521005@redhat.com> (raw)
In-Reply-To: <CAHbLzkri0u076G2qsAjKGKzTQzMjTVt3Kxoq_-_3GU4-HoOoMQ@mail.gmail.com>
On 19.03.21 18:04, Yang Shi wrote:
> On Thu, Mar 11, 2021 at 1:35 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 11.03.21 22:26, Peter Xu wrote:
>>> On Thu, Mar 11, 2021 at 07:14:02PM +0100, David Hildenbrand wrote:
>>>> I was wondering, is there any mechanism that reclaims basically empty page
>>>> tables in a running process?
>>>
>>> Would munmap() count? :)
>>
>> Haha, no -- also not mmap(FIXED) or mremap(FIXED) ;)
>>
>> As so often lately, the use case is sparse memory mappings where we
>>
>> a) may want to reuse the area later.
>> b) don't want to hold the mmap lock in write while optimizing
>> c) don't want to create a lot of individual mappings that we might not
>> be able to merge again.
>
> Will the below work for you?
>
> 1. acquire write mmap lock
> 2. unlink vmas from the list and rbtree (so the vmas won't be visible
> to any concurrent readers/writers)
> 3. downgrade write lock to read lock
> 4. zap page tables and free page tables
> 5. upgrade to write lock
> 6. relink vmas back to list and rbtree
>
> Actually the current implementation of munmap() does the first 5 steps.
That's almost mmap(MAP_FIXED) for the cases where we can merge VMAs. But
I don't think this is actually what we want. We don't want to do such
optimizations while we're in mmap-read-locked MADV_DONTNEED etc.
Simple example: QEMU implements memory ballooning for its VMs via
virtio-balloon. When the guest inflates/deflates 4k pages and we're
using anonymous memory, we issue madvise(MADV_DONTNEED) syscalls for
each 4k page. At some point, we might be able to reclaim page tables -
but we don't want to suddenly take the mmap lock in write during
madvise() when there is no actual memory pressure, or scan for
optimization opportunities during every syscall. User space pretty much
relies on madvise(DONTNEED) being fast and little intrusive.
I think there might be other cases where we can reclaim page tables as
well, not necessarily triggered by user space. For example, after we
wrote back/evicted a sequence of file-mapped pages, I would assume that
we might also be able to reclaim page tables, but I haven't looked into
it yet. For now, I mostly care about page table reclaim for the cases
where we discard pages from page tables completely (MADV_DONTNEED,
MADV_FREE, MADV_REMOVE, fallocate(PUNCH_HOLE)).
I envision page table reclaim to happen asynchronously, either
periodically once under memory pressure, or once sufficient evidence is
there that reclaim might make sense. There, similarly to khugepaged, we
might have to temporarily take the mmap lock in write for a short period
in time, but I'll have to look into the details first.
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2021-03-22 9:34 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-11 18:14 David Hildenbrand
2021-03-11 21:26 ` Peter Xu
2021-03-11 21:35 ` David Hildenbrand
2021-03-19 17:04 ` Yang Shi
2021-03-22 9:34 ` David Hildenbrand [this message]
2021-03-18 16:57 ` Vlastimil Babka
2021-03-18 23:53 ` Balbir Singh
2021-03-19 12:44 ` David Hildenbrand
2021-03-20 1:56 ` Balbir Singh
2021-03-22 9:19 ` David Hildenbrand
2021-03-18 18:03 ` Rik van Riel
2021-03-18 18:15 ` David Hildenbrand
2021-03-24 9:55 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53b83aae-01d7-156d-d1bb-836002521005@redhat.com \
--to=david@redhat.com \
--cc=aarcange@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=peterx@redhat.com \
--cc=riel@surriel.com \
--cc=shy828301@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox