From: Yang Shi <shy828301@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: Peter Xu <peterx@redhat.com>, Shakeel Butt <shakeelb@google.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
Zi Yan <ziy@nvidia.com>, Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linux MM <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
David Rientjes <rientjes@google.com>
Subject: Re: [PATCH] mm: split thp synchronously on MADV_DONTNEED
Date: Mon, 29 Nov 2021 13:32:41 -0800 [thread overview]
Message-ID: <CAHbLzkoP-YTf_=OKatGbVosm6PPGT9TnY1g8_E4v4X5GE_m6HQ@mail.gmail.com> (raw)
In-Reply-To: <f464115b-f332-9f13-89c4-81bf6b282975@redhat.com>
On Fri, Nov 26, 2021 at 1:17 AM David Hildenbrand <david@redhat.com> wrote:
>
> >>
> >> Thanks for making me rerun this and yes indeed I had a very silly bug in the
> >> benchmark code (i.e. madvise the same page for the whole loop) and this is
> >> indeed several times slower than without the patch (sorry David for misleading
> >> you).
>
> No worries, BUGs happen :)
>
> >>
> >> To better understand what is happening, I profiled the benchmark:
> >>
> >> - 31.27% 0.01% dontneed [kernel.kallsyms] [k] zap_page_range_sync
> >> - 31.27% zap_page_range_sync
> >> - 30.25% split_local_deferred_list
> >> - 30.16% split_huge_page_to_list
> >> - 21.05% try_to_migrate
> >> + rmap_walk_anon
> >> - 7.47% remove_migration_ptes
> >> + 7.34% rmap_walk_locked
> >> + 1.02% zap_page_range_details
> >
> > Makes sense, thanks for verifying it, Shakeel. I forgot it'll also walk
> > itself.
> >
> > I believe this effect will be exaggerated when the mapping is shared,
> > e.g. shmem file thp between processes. What's worse is that when one process
> > DONTNEED one 4k page, all the rest mms will need to split the huge pmd without
> > even being noticed, so that's a direct suffer from perf degrade.
>
> Would this really apply to MADV_DONTNEED on shmem, and would deferred
> splitting apply on shmem? I'm constantly confused about shmem vs. anon,
> but I would have assumed that shmem is fd-based and we wouldn't end up
> in rmap_walk_anon. For shmem, the pagecache would contain the THP which
> would stick around and deferred splits don't even apply.
The deferred split is anon THP only, it doesn't apply to shmem THP.
For the rmap walk, there are two ramp walks for anon THP, but just one
for shmem THP. Both needs one rmap walk to unmap the THP before doing
actual split, but anon THP needs another rmap walk to reinstall PTEs
for still mapped pages, however this is not needed for shmem pages
since they could be reached via page cache.
>
> But again, I'm constantly confused so I'd love to be enlighted.
>
> >
> >>
> >> The overhead is not due to copying page flags but rather due to two rmap walks.
> >> I don't think this much overhead is justified for current users of MADV_DONTNEED
> >> and munmap. I have to rethink the approach.
>
> Most probably not.
>
> >
> > Some side notes: I digged out the old MADV_COLLAPSE proposal right after I
> > thought about MADV_SPLIT (or any of its variance):
> >
> > https://lore.kernel.org/all/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/
> >
> > My memory was that there's some issue to be solved so that was blocked, however
> > when I read the thread it sounds like the list was mostly reaching a consensus
> > on considering MADV_COLLAPSE being beneficial. Still copying DavidR in case I
> > missed something important.
> >
> > If we think MADV_COLLAPSE can help to implement an userspace (and more
> > importantly, data-aware) khugepaged, then MADV_SPLIT can be the other side of
> > kcompactd, perhaps.
> >
> > That's probably a bit off topic of this specific discussion on the specific use
> > case, but so far it seems all reasonable and discussable.
>
> User space can trigger a split manually using some MADV hackery. But it
> can only be used for the use case here, where we actually want to zap a
> page.
>
> 1. MADV_FREE a single 4k page in the range. This will split the PMD->PTE
> and the compound page.
> 2. MADV_DONTNEED either the complete range or the single 4k page.
>
> --
> Thanks,
>
> David / dhildenb
>
next prev parent reply other threads:[~2021-11-29 21:33 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-20 20:12 Shakeel Butt
2021-11-21 4:35 ` Matthew Wilcox
2021-11-21 5:25 ` Shakeel Butt
2021-11-22 0:50 ` Kirill A. Shutemov
2021-11-22 3:42 ` Shakeel Butt
2021-11-22 4:56 ` Matthew Wilcox
2021-11-22 9:19 ` David Hildenbrand
2021-12-08 13:23 ` Pankaj Gupta
2021-11-22 8:32 ` David Hildenbrand
2021-11-22 18:40 ` Shakeel Butt
2021-11-22 18:59 ` David Hildenbrand
2021-11-23 1:20 ` Shakeel Butt
2021-11-23 16:56 ` David Hildenbrand
2021-11-23 17:17 ` Shakeel Butt
2021-11-23 17:20 ` David Hildenbrand
2021-11-23 17:24 ` Shakeel Butt
2021-11-23 17:26 ` David Hildenbrand
2021-11-23 17:28 ` Shakeel Butt
2021-11-25 10:09 ` Peter Xu
2021-11-25 17:14 ` Shakeel Butt
2021-11-26 0:00 ` Peter Xu
2021-11-25 10:24 ` Peter Xu
2021-11-25 10:32 ` David Hildenbrand
2021-11-26 2:52 ` Peter Xu
2021-11-26 9:04 ` David Hildenbrand
2021-11-29 22:00 ` Yang Shi
2021-11-26 3:21 ` Shakeel Butt
2021-11-26 4:12 ` Peter Xu
2021-11-26 9:16 ` David Hildenbrand
2021-11-26 9:39 ` Peter Xu
2021-11-29 21:32 ` Yang Shi [this message]
2022-01-24 18:48 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHbLzkoP-YTf_=OKatGbVosm6PPGT9TnY1g8_E4v4X5GE_m6HQ@mail.gmail.com' \
--to=shy828301@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
--cc=rientjes@google.com \
--cc=shakeelb@google.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox