* [LSF/MM/BPF TOPIC] page table reclaim
@ 2022-02-22 8:56 David Hildenbrand
2022-02-24 20:09 ` Matthew Wilcox
0 siblings, 1 reply; 2+ messages in thread
From: David Hildenbrand @ 2022-02-22 8:56 UTC (permalink / raw)
To: lsf-pc, linux-mm
Hi all,
we are aware of workloads that can trigger allocation of a lot of page
tables that are essentially unnecessary. The obvious candidates are
processes that dynamically manage memory consumption in large, sparse
memory mappings e.g., via madvise(MADV_DONTNEED): hypervisors that
implement memory ballooning or virtio-mem, and memory allocators.
In fact, it's easy to have a process that almost exclusively consumes
page tables only, and it's hard to distinguish between "malicious" and
"sane" workload when just looking at the page table consumption. I have
quite some neat examples that I can present.
Page tables are unmovable in memory an cannot get swapped out. So heavy
page table consumption isn't only problematic because we end up wasting
system RAM and fragmenting system RAM with unmovable allocations, it's
also a problem when having big portions of system RAM managed by
CMA/ZONE_MOVABLE where we can just run out of system RAM available for
unmovable allocations and eventually harm the system / other workloads
in the same machine.
One approach I'd like to discuss is page table reclaim: reclaiming
unnecessary page tables, which involves a lot of challenges.
1. Efficient page table reclaim
"Ripping out" a page table is an expensive and highly complicated
operation: just take a look at khugepaged. We have to block all page
table walkers, which requires the mmap_lock in write mode, the rmap
lock, and proper synchronization with GUP-fast.
In the simplest approach, we'd scan for candidate page tables to then
rip them out. But:
* How to scan for candidate page tables efficiently?
* How to avoid the mmap_lock in write mode when removing a page table?
* How to avoid the rmap lock (just imagine a page table spanning
multiple rmaps)?
But also: how to make the implementation simple and appealing to get
merged upstream? For example, the last attempt to reclaim empty PTE page
tables [1] automatically once the last PTE was zapped was not merged yet
because it certainly adds complexity. How to avoid that complexity?
2. Who triggers reclaim and when?
Letting an application trigger reclaim of page tables is the "easiest
solution": let's imagine madvise(MADV_RECLAIM_PGTABLES). However, this
doesn't take care of malicious workloads and is more problematic when
having sparse files mapped into multiple processes. Further, there is no
need to reclaim if we're not under memory pressure.
Letting the system do this automatically looks "cleaner". But, when to
start reclaiming? How to detect and handle malicious processes (do we
care?)? How to set an adequate soft/hard limit?
3. Which page tables to reclaim?
While the obvious candidates are empty page tables, we can easily have
page tables all filled with the shared zeropage instead. Once again,
there are sane and malicious use cases. A sane use case is a simple VM
having a balloon inflated and triggering a memory dump like kdump: we'll
populate the shared zeropage everywhere and have plenty of page tables
we don't even care about.
But once we talk about reclaiming page tables that are still populated
with the shared zeropage, why not reclaim page tables that are
"reconstructable", for example, because they don't map anonymous pages
and don't require special fault handling (userfaultfd?)?
While I do have answers to some of the questions and various ideas, it's
certainly an interesting topic to discuss and brainstorm.
[1]https://lkml.kernel.org/r/20211110105428.32458-1-zhengqi.arch@bytedance.com
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [LSF/MM/BPF TOPIC] page table reclaim
2022-02-22 8:56 [LSF/MM/BPF TOPIC] page table reclaim David Hildenbrand
@ 2022-02-24 20:09 ` Matthew Wilcox
0 siblings, 0 replies; 2+ messages in thread
From: Matthew Wilcox @ 2022-02-24 20:09 UTC (permalink / raw)
To: David Hildenbrand; +Cc: lsf-pc, linux-mm
On Tue, Feb 22, 2022 at 09:56:20AM +0100, David Hildenbrand wrote:
> 2. Who triggers reclaim and when?
>
> Letting an application trigger reclaim of page tables is the "easiest
> solution": let's imagine madvise(MADV_RECLAIM_PGTABLES). However, this
> doesn't take care of malicious workloads and is more problematic when
> having sparse files mapped into multiple processes. Further, there is no
> need to reclaim if we're not under memory pressure.
>
> Letting the system do this automatically looks "cleaner". But, when to
> start reclaiming? How to detect and handle malicious processes (do we
> care?)? How to set an adequate soft/hard limit?
I don't think we care about the difference between users that are
performing useful work with an inefficient page table strategy and users
that are trying to break the page table usage scheme. We have to account
the page tables to each process (which we already do), and a process which
is, say, trying to allocate memory might be shunted off to a path where
it tries to reclaim its own page tables if it has a lot on the books.
Particularly if it's trying to allocate memory for more page tables ;-)
> While I do have answers to some of the questions and various ideas, it's
> certainly an interesting topic to discuss and brainstorm.
Indeed; it interests me too.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-02-24 20:38 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-22 8:56 [LSF/MM/BPF TOPIC] page table reclaim David Hildenbrand
2022-02-24 20:09 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox