From: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Seth Jennings <sjenning@redhat.com>,
Dan Streetman <ddstreet@ieee.org>,
Vitaly Wool <vitaly.wool@konsulko.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Nhat Pham <nphamcs@gmail.com>, Yu Zhao <yuzhao@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: zswap: fix double invalidate with exclusive loads
Date: Wed, 21 Jun 2023 21:36:41 +0200 [thread overview]
Message-ID: <CA+CLi1hPfvy_kJyi8N6ygNhY9hNH5J6-kN9i1pRZz76dX5b0Lg@mail.gmail.com> (raw)
In-Reply-To: <CAJD7tkYGz3A3-mkzbZBfoHX5gATPseqiwZon0i3rug2h2M3jyg@mail.gmail.com>
On Wed, Jun 21, 2023 at 7:26 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Wed, Jun 21, 2023 at 3:20 AM Domenico Cerasuolo
> <cerasuolodomenico@gmail.com> wrote:
> >
> > On Wed, Jun 21, 2023 at 11:30 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > If exclusive loads are enabled for zswap, we invalidate the entry before
> > > returning from zswap_frontswap_load(), after dropping the local
> > > reference. However, the tree lock is dropped during decompression after
> > > the local reference is acquired, so the entry could be invalidated
> > > before we drop the local ref. If this happens, the entry is freed once
> > > we drop the local ref, and zswap_invalidate_entry() tries to invalidate
> > > an already freed entry.
> > >
> > > Fix this by:
> > > (a) Making sure zswap_invalidate_entry() is always called with a local
> > > ref held, to avoid being called on a freed entry.
> > > (b) Making sure zswap_invalidate_entry() only drops the ref if the entry
> > > was actually on the rbtree. Otherwise, another invalidation could
> > > have already happened, and the initial ref is already dropped.
> > >
> > > With these changes, there is no need to check that there is no need to
> > > make sure the entry still exists in the tree in zswap_reclaim_entry()
> > > before invalidating it, as zswap_reclaim_entry() will make this check
> > > internally.
> > >
> > > Fixes: b9c91c43412f ("mm: zswap: support exclusive loads")
> > > Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > > ---
> > > mm/zswap.c | 21 ++++++++++++---------
> > > 1 file changed, 12 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > index 87b204233115..62195f72bf56 100644
> > > --- a/mm/zswap.c
> > > +++ b/mm/zswap.c
> > > @@ -355,12 +355,14 @@ static int zswap_rb_insert(struct rb_root *root, struct zswap_entry *entry,
> > > return 0;
> > > }
> > >
> > > -static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry)
> > > +static bool zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry)
> > > {
> > > if (!RB_EMPTY_NODE(&entry->rbnode)) {
> > > rb_erase(&entry->rbnode, root);
> > > RB_CLEAR_NODE(&entry->rbnode);
> > > + return true;
> > > }
> > > + return false;
> > > }
> > >
> > > /*
> > > @@ -599,14 +601,16 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor)
> > > return NULL;
> > > }
> > >
> > > +/*
> > > + * If the entry is still valid in the tree, drop the initial ref and remove it
> > > + * from the tree. This function must be called with an additional ref held,
> > > + * otherwise it may race with another invalidation freeing the entry.
> > > + */
> >
> > On re-reading this comment there's one thing I'm not sure I get, do we
> > really need to hold an additional local ref to call this? As far as I
> > understood, once we check that the entry was in the tree before putting
> > the initial ref, there's no need for an additional local one.
>
> I believe it is, but please correct me if I am wrong. Consider the
> following scenario:
>
> // Initially refcount is at 1
>
> CPU#1: CPU#2:
> spin_lock(tree_lock)
> zswap_entry_get() // 2 refs
> spin_unlock(tree_lock)
> spin_lock(tree_lock)
> zswap_invalidate_entry() // 1 ref
> spin_unlock(tree_lock)
> zswap_entry_put() // 0 refs
> zswap_invalidate_entry() // problem
>
> That last zswap_invalidate_entry() call in CPU#1 is problematic. The
> entry would have already been freed. If we check that the entry is on
> the tree by checking RB_EMPTY_NODE(&entry->rbnode), then we are
> reading already freed and potentially re-used memory.
>
> We would need to search the tree to make sure the same entry still
> exists in the tree (aka what zswap_reclaim_entry() currently does).
> This is not ideal in the fault path to have to do the lookups twice.
Thanks for the clarification, it is indeed needed in that case. I was just
wondering if the wording of the comment is exact, in that before calling
zswap_invalidate_entry one has to ensure that the entry has not been freed, not
specifically by holding an additional reference, if a lookup can serve the same
purpose.
>
> Also, in zswap_reclaim_entry(), would it be possible if we call
> zswap_invalidate_entry() after we drop the local ref that the swap
> entry has been reused for a different page? I didn't look closely, but
> if yes, then the slab allocator may have repurposed the zswap_entry
> and we may find the entry in the tree for the same offset, even though
> it is referring to a different page now. This sounds practically
> unlikely but perhaps theoretically possible.
I'm not sure I understood the scenario, in zswap_reclaim_entry we keep a local
reference until the end in order to avoid a free.
>
> I think it's more reliable to call zswap_invalidate_entry() on an
> entry that we know is valid before dropping the local ref. Especially
> that it's easy to do today by just moving a few lines around.
>
>
>
>
> >
> > > static void zswap_invalidate_entry(struct zswap_tree *tree,
> > > struct zswap_entry *entry)
> > > {
> > > - /* remove from rbtree */
> > > - zswap_rb_erase(&tree->rbroot, entry);
> > > -
> > > - /* drop the initial reference from entry creation */
> > > - zswap_entry_put(tree, entry);
> > > + if (zswap_rb_erase(&tree->rbroot, entry))
> > > + zswap_entry_put(tree, entry);
> > > }
> > >
> > > static int zswap_reclaim_entry(struct zswap_pool *pool)
> > > @@ -659,8 +663,7 @@ static int zswap_reclaim_entry(struct zswap_pool *pool)
> > > * swapcache. Drop the entry from zswap - unless invalidate already
> > > * took it out while we had the tree->lock released for IO.
> > > */
> > > - if (entry == zswap_rb_search(&tree->rbroot, swpoffset))
> > > - zswap_invalidate_entry(tree, entry);
> > > + zswap_invalidate_entry(tree, entry);
> > >
> > > put_unlock:
> > > /* Drop local reference */
> > > @@ -1466,7 +1469,6 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> > > count_objcg_event(entry->objcg, ZSWPIN);
> > > freeentry:
> > > spin_lock(&tree->lock);
> > > - zswap_entry_put(tree, entry);
> > > if (!ret && zswap_exclusive_loads_enabled) {
> > > zswap_invalidate_entry(tree, entry);
> > > *exclusive = true;
> > > @@ -1475,6 +1477,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> > > list_move(&entry->lru, &entry->pool->lru);
> > > spin_unlock(&entry->pool->lru_lock);
> > > }
> > > + zswap_entry_put(tree, entry);
> > > spin_unlock(&tree->lock);
> > >
> > > return ret;
> > > --
> > > 2.41.0.162.gfafddb0af9-goog
> > >
next prev parent reply other threads:[~2023-06-21 19:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-21 9:30 Yosry Ahmed
2023-06-21 10:20 ` Domenico Cerasuolo
2023-06-21 17:26 ` Yosry Ahmed
2023-06-21 19:36 ` Domenico Cerasuolo [this message]
2023-06-21 21:22 ` Yosry Ahmed
2023-06-22 6:32 ` Domenico Cerasuolo
2023-06-22 6:39 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+CLi1hPfvy_kJyi8N6ygNhY9hNH5J6-kN9i1pRZz76dX5b0Lg@mail.gmail.com \
--to=cerasuolodomenico@gmail.com \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=ddstreet@ieee.org \
--cc=hannes@cmpxchg.org \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=sjenning@redhat.com \
--cc=vitaly.wool@konsulko.com \
--cc=yosryahmed@google.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox