From: Alistair Popple <apopple@nvidia.com>
To: Peter Xu <peterx@redhat.com>
Cc: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
David Hildenbrand <david@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Yang Shi <shy828301@gmail.com>, Vlastimil Babka <vbabka@suse.cz>,
Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
"Kirill A . Shutemov" <kirill@shutemov.name>
Subject: Re: [PATCH RFC v2 1/2] mm: Don't skip swap entry even if zap_details specified
Date: Fri, 3 Dec 2021 16:33:43 +1100 [thread overview]
Message-ID: <11226930.BYJfa7kJGD@nvdebian> (raw)
In-Reply-To: <YamNRcrLDOPjG9wg@xz-m1.local>
On Friday, 3 December 2021 2:21:41 PM AEDT Peter Xu wrote:
> On Thu, Dec 02, 2021 at 10:06:46PM +1100, Alistair Popple wrote:
> > On Tuesday, 16 November 2021 12:49:50 AM AEDT Peter Xu wrote:
> > > This check existed since the 1st git commit of Linux repository, but at that
> > > time there's no page migration yet so I think it's okay.
> > >
> > > With page migration enabled, it should logically be possible that we zap some
> > > shmem pages during migration. When that happens, IIUC the old code could have
> > > the RSS counter accounted wrong on MM_SHMEMPAGES because we will zap the ptes
> > > without decreasing the counters for the migrating entries. I have no unit test
> > > to prove it as I don't know an easy way to trigger this condition, though.
> > >
> > > Besides, the optimization itself is already confusing IMHO to me in a few points:
> >
> > I've spent a bit of time looking at this and think it would be good to get
> > cleaned up as I've found it hard to follow in the past. What I haven't been
> > able to confirm is if anything relies on skipping swap entries or not. From
> > you're description it sounds like skipping swap entries was done as an
> > optimisation rather than for some functional reason is that correct?
>
> Thanks again for looking into this patch, Alistair. I appreciate it a lot.
>
> I should say that it's how I understand this, and I could be wrong, that's the
That makes two of us!
> major reason why I marked this patch as RFC.
>
> As I mentioned this behavior existed in the 1st commit of git history of Linux,
> that's the time when there's no special swap entries at all but all the swap
> entries are "real" swap entries for anonymous.
>
> That's why I think it should be an optimization because when previously
> zap_details (along with zap_details->mapping in the old code) is non-null, and
> that's definitely not an anonymous page. Then skipping swap entry for file
> backed memory sounds like a good optimization.
Thanks. That was the detail I was trying to figure out. Ie. why might something
want to skip swap entries. I will spend some more time looking to be sure
though.
> However after that we've got all kinds of swap entries introduced, and as you
> spotted at least the migration entry should be able to exist to some file
> backed memory type (shmem).
>
> >
> > > - The wording "skip swap entries" is confusing, because we're not skipping all
> > > swap entries - we handle device private/exclusive pages before that.
> > >
> > > - The skip behavior is enabled as long as zap_details pointer passed over.
> > > It's very hard to figure that out for a new zap caller because it's unclear
> > > why we should skip swap entries when we have zap_details specified.
> > >
> > > - With modern systems, especially performance critical use cases, swap
> > > entries should be rare, so I doubt the usefulness of this optimization
> > > since it should be on a slow path anyway.
> > >
> > > - It is not aligned with what we do with huge pmd swap entries, where in
> > > zap_huge_pmd() we'll do the accounting unconditionally.
> > >
> > > This patch drops that trick, so we handle swap ptes coherently. Meanwhile we
> > > should do the same mapping check upon migration entries too.
> >
> > I agree, and I'm not convinced the current handling is very good - if we
> > skip zapping a migration entry then the page mapping might get restored when
> > the migration entry is removed.
> >
> > In practice I don't think that is a problem as the migration entry target page
> > will be locked, and if I'm understanding things correctly callers of
> > unmap_mapping_*() need to have the page(s) locked anyway if they want to be
> > sure the page is unmapped. But it seems removing the migration entries better
> > matches the intent and I can't think of a reason why they should be skipped.
>
> Exactly, that's what I see this too.
>
> I used to think there is a bug for shmem migration (if you still remember I
> mentioned it in some of my previous patchset cover letters), but then I found
> migration requires page lock then it's probably not a real bug at all. However
> that's never a convincing reason to ignore swap entries.
Right, it also took me a while to convince myself there wasn't a bug there so
if for some reason this patch doesn't end up going in I think we should still
treat migration entries the same way as device-private entries.
> I wanted to "ignore" this problem by the "adding a flag to skip swap entry"
> patch, but as you saw it was very not welcomed anyway, so I have no choice to
> try find the fundamental reason for skipping swap entries. When I figured I
> cannot really find any good reason and skipping seems to be even buggy, hence
> this patch. If this is the right way, the zap pte path can be simplified quite
> a lot after patch 2 of this series.
Yep, I think it's definitely worth trying to figure out. And if it turns out
there is some good reason for skipping we better make sure to document it in a
comment somewhere so none of this good research is lost. However I haven't yet
come up with a reason why they need to be skipped either.
- Alistair
next prev parent reply other threads:[~2021-12-03 5:34 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-15 13:49 [PATCH RFC v2 0/2] mm: Rework zap ptes on swap entries Peter Xu
2021-11-15 13:49 ` [PATCH RFC v2 1/2] mm: Don't skip swap entry even if zap_details specified Peter Xu
2021-12-02 11:06 ` Alistair Popple
2021-12-03 3:21 ` Peter Xu
2021-12-03 5:33 ` Alistair Popple [this message]
2021-12-03 6:59 ` Peter Xu
2022-01-09 1:19 ` Hugh Dickins
2022-01-12 13:18 ` Peter Xu
2022-01-12 13:26 ` Peter Xu
2022-01-13 3:47 ` Hugh Dickins
2022-01-20 10:32 ` Peter Xu
2022-01-21 3:11 ` Peter Xu
2022-01-21 5:11 ` Peter Xu
2022-01-24 6:51 ` Hugh Dickins
2022-01-24 9:13 ` Peter Xu
2022-01-24 6:29 ` Hugh Dickins
2022-01-24 8:54 ` Peter Xu
2022-01-24 11:01 ` Peter Xu
2022-01-10 8:37 ` David Hildenbrand
2022-01-11 7:40 ` Alistair Popple
2022-01-11 9:00 ` David Hildenbrand
2021-11-15 13:49 ` [PATCH RFC v2 2/2] mm: Rework swap handling of zap_pte_range Peter Xu
2021-11-15 13:57 ` Matthew Wilcox
2021-11-16 5:06 ` Peter Xu
2021-11-16 8:51 ` John Hubbard
2021-11-16 13:11 ` Matthew Wilcox
2021-11-16 19:06 ` John Hubbard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=11226930.BYJfa7kJGD@nvdebian \
--to=apopple@nvidia.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
--cc=shy828301@gmail.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox