Re: [PATCH next 3/3] shmem: Fix "Unused swap" messages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Hugh Dickins <hughd@google.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	 William Kucharski <william.kucharski@oracle.com>,
	 linux-fsdevel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH next 3/3] shmem: Fix "Unused swap" messages
Date: Mon, 3 Jan 2022 20:43:13 -0800 (PST)	[thread overview]
Message-ID: <2fb90c3-5285-56ca-65af-439c4527dbe4@google.com> (raw)
In-Reply-To: <YdOlt5FJn9L+3sjM@casper.infradead.org>

On Tue, 4 Jan 2022, Matthew Wilcox wrote:
> On Mon, Jan 03, 2022 at 12:10:21PM -0800, Hugh Dickins wrote:
> > On Mon, 3 Jan 2022, Matthew Wilcox wrote:
> > > On Sun, Jan 02, 2022 at 05:35:50PM -0800, Hugh Dickins wrote:
> > > > shmem_swapin_page()'s swap_free() has occasionally been generating
> > > > "_swap_info_get: Unused swap offset entry" messages.  Usually that's
> > > > no worse than noise; but perhaps it indicates a worse case, when we
> > > > might there be freeing swap already reused by others.
> > > > 
> > > > The multi-index xas_find_conflict() loop in shmem_add_to_page_cache()
> > > > did not allow for entry found NULL when expected to be non-NULL, so did
> > > > not catch that race when the swap has already been freed.
> > > > 
> > > > The loop would not actually catch a realistic conflict which the single
> > > > check does not catch, so revert it back to the single check.
> > > 
> > > I think what led to the loop was concern for the xa_state if trying
> > > to find a swap entry that's smaller than the size of the folio.
> > > So yes, the loop was expected to execute twice, but I didn't consider
> > > the case where we were looking for something non-NULL and actually found
> > > NULL.
> > > 
> > > So should we actually call xas_find_conflict() twice (if we're looking
> > > for something non-NULL), and check that we get @expected, followed by
> > > NULL?
> > 
> > Sorry, I've no idea.
> > 
> > You say "twice", and that does not fit the imaginary model I had when I
> > said "The loop would not actually catch a realistic conflict which the
> > single check does not catch".
> > 
> > I was imagining it either looking at a single entry, or looking at an
> > array of (perhaps sometimes in shmem's case 512) entries, looking for
> > conflict with the supplied pointer/value expected there.
> > 
> > The loop technique was already unable to report on unexpected NULLs,
> > and the single test would catch a non-NULL entry different from an
> > expected non-NULL entry.  Its only relative weakness appeared to be
> > if that array contained (perhaps some NULLs then) a "narrow" instance
> > of the same pointer/value that was expected to fill the array; and I
> > didn't see any possibility for shmem to be inserting small and large
> > folios sharing the same address at the same time.
> > 
> > That "explanation" may make no sense to you, don't worry about it;
> > just as "twice" makes no immediate sense to me - I'd have to go off
> > and study multi-index XArray to make sense of it, which I'm not
> > about to do.
> > 
> > I've seen no problems with the proposed patch, but if you see a real
> > case that it's failing to cover, yes, please do improve it of course.
> > 
> > Though now I'm wondering if the "loop" totally misled me; and your
> > "twice" just means that we need to test first this and then that and
> > we're done - yeah, maybe.
> 
> Sorry; I wrote the above in a hurry and early in the morning, so probably
> even less coherent than usual.  Also, the multi-index xarray code is
> new to everyone (and adds new things to consider), so it's no surprise
> we're talking past each other.  It's a bit strange for me to read your
> explanations because you're only reading what I actually wrote instead
> of what I intended to write.
> 
> So let me try again.  My concern was that we might be trying to store
> a 2MB entry which had a non-NULL 'expected' entry which was found in a
> 4k (ie single-index) slot within the 512 entries (as the first non-NULL
> entry in that range), and we'd then store the 2MB entry into a
> single-entry slot.

Thanks, that sounds much more like how I was imagining it.  And the
two separate tests much more understandable than twice round a loop.

> 
> Now, maybe that can't happen for higher-level reasons, and I don't need
> to worry about it.  But I feel like we should check for that?  Anyway,
> I think the right fix is this:

I don't object to (cheaply) excluding a possibility at the low level,
even if there happen to be high level reasons why it cannot happen at
present.

But I don't think your second xas_find_conflict() gives quite as much
assurance as you're expecting of it.  Since xas_find_conflict() skips
NULLs, the conflict test would pass if there were 511 NULLs and one
4k entry matching the expected entry, but a 2MB entry to be inserted
(the "small and large folios" case in my earlier ramblings).

I think xas_find_conflict() is not really up to giving that assurance;
and maybe better than a second call to xas_find_conflict(), might be
a VM_BUG_ON earlier, to say that if 'expected' is non-NULL, then the
range is PAGE_SIZE - or something more folio-friendly like that?
That would give you the extra assurance you're looking for,
wouldn't it?

For now and the known future, shmem only swaps PAGE_SIZE, out and in;
maybe someone will want to change that one day, then xas_find_conflict()
could be enhanced to know more of what's expected.

> 
> +++ b/mm/shmem.c
> @@ -733,11 +733,12 @@ static int shmem_add_to_page_cache(struct page *page,
>         cgroup_throttle_swaprate(page, gfp);
> 
>         do {
> -               void *entry;
>                 xas_lock_irq(&xas);
> -               while ((entry = xas_find_conflict(&xas)) != NULL) {
> -                       if (entry == expected)
> -                               continue;
> +               if (expected != xas_find_conflict(&xas)) {
> +                       xas_set_err(&xas, -EEXIST);
> +                       goto unlock;
> +               }
> +               if (expected && xas_find_conflict(&xas)) {
>                         xas_set_err(&xas, -EEXIST);
>                         goto unlock;
>                 }

That also worried me because, if the second xas_find_conflict()
is to make any sense, the first must have had a side-effect on xas:
are those side-effects okay for the subsequent xas_store(&xas, page)?
You'll know that they are, but it's not obvious to the reader.

> 
> which says what I mean.  I certainly didn't intend to imply that I
> was expecting to see 512 consecutive entries which were all identical,
> which would be the idiomatic way to read the code that was there before.
> I shouldn't've tried to be so concise.
> 
> (If you'd rather I write any of this differently, I'm more than happy
> to change it)

No, I'm happy with the style of it, just discontented that the second
xas_find_conflict() pretends to more than it provides (I think).

Hugh

next prev parent reply	other threads:[~2022-01-04  4:43 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-03  1:35 Hugh Dickins
2022-01-03 15:36 ` Matthew Wilcox
2022-01-03 20:10   ` Hugh Dickins
2022-01-04  1:41     ` Matthew Wilcox
2022-01-04  4:43       ` Hugh Dickins [this message]
2022-01-04  8:12         ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2fb90c3-5285-56ca-65af-439c4527dbe4@google.com \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox