linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Chris Li <chrisl@kernel.org>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	"Wei Xu" <weixugc@google.com>, "Yu Zhao" <yuzhao@google.com>,
	"Greg Thelen" <gthelen@google.com>,
	"Chun-Tse Shao" <ctshao@google.com>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Brain Geffon" <bgeffon@google.com>,
	"Minchan Kim" <minchan@kernel.org>,
	"Michal Hocko" <mhocko@suse.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Huang Ying" <ying.huang@intel.com>,
	"Nhat Pham" <nphamcs@gmail.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Kairui Song" <kasong@tencent.com>,
	"Zhongkun He" <hezhongkun.hzk@bytedance.com>,
	"Kemeng Shi" <shikemeng@huaweicloud.com>,
	"Barry Song" <v-songbaohua@oppo.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	"Joel Fernandes" <joel@joelfernandes.org>,
	"Chengming Zhou" <zhouchengming@bytedance.com>
Subject: Re: [PATCH 2/2] mm: zswap.c: remove RB tree
Date: Thu, 18 Jan 2024 21:43:39 -0800	[thread overview]
Message-ID: <CAF8kJuMwF=s-28OPFbCfJf-f3jsfYmyiP6pSBjj8ZgkGmbT9ZA@mail.gmail.com> (raw)
In-Reply-To: <CAJD7tkZF102x_8LKAX+sxAttgYD_LNT3cRqeOr7_euwPfNdCFA@mail.gmail.com>

Hi Yosry,

On Wed, Jan 17, 2024 at 10:35 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> > @@ -493,45 +471,47 @@ static struct zswap_entry *zswap_search(struct zswap_tree *tree, pgoff_t offset)
> >  static int zswap_insert(struct zswap_tree *tree, struct zswap_entry *entry,
> >                         struct zswap_entry **dupentry)
> >  {
> > -       struct rb_root *root = &tree->rbroot;
> > -       struct rb_node **link = &root->rb_node, *parent = NULL;
> > -       struct zswap_entry *myentry, *old;
> > -       pgoff_t myentry_offset, entry_offset = swp_offset(entry->swpentry);
> > -
> > -
> > -       while (*link) {
> > -               parent = *link;
> > -               myentry = rb_entry(parent, struct zswap_entry, rbnode);
> > -               myentry_offset = swp_offset(myentry->swpentry);
> > -               if (myentry_offset > entry_offset)
> > -                       link = &(*link)->rb_left;
> > -               else if (myentry_offset < entry_offset)
> > -                       link = &(*link)->rb_right;
> > -               else {
> > -                       old = xa_load(&tree->xarray, entry_offset);
> > -                       BUG_ON(old != myentry);
> > -                       *dupentry = myentry;
> > +       struct zswap_entry *e;
> > +       pgoff_t offset = swp_offset(entry->swpentry);
> > +       XA_STATE(xas, &tree->xarray, offset);
> > +
> > +       do {
> > +               xas_lock_irq(&xas);
> > +               do {
> > +                       e = xas_load(&xas);
> > +                       if (xa_is_zero(e))
> > +                               e = NULL;
> > +               } while (xas_retry(&xas, e));
> > +               if (xas_valid(&xas) && e) {
> > +                       xas_unlock_irq(&xas);
> > +                       *dupentry = e;
> >                         return -EEXIST;
> >                 }
> > -       }
> > -       rb_link_node(&entry->rbnode, parent, link);
> > -       rb_insert_color(&entry->rbnode, root);
> > -       old = xa_store(&tree->xarray, entry_offset, entry, GFP_KERNEL);
> > -       return 0;
> > +               xas_store(&xas, entry);
> > +               xas_unlock_irq(&xas);
> > +       } while (xas_nomem(&xas, GFP_KERNEL));
> > +       return xas_error(&xas);
>
> I think using the xas_* APIs can be avoided here. The only reason we
> need it is that we want to check if there's an existing entry first,
> and return -EEXIST. However, in that case, the caller will replace it
> anyway (and do some operations on the dupentry):

We might be able to for the insert case if we don't mind changing the
code behavior a bit. My original intent is to keep close to the
original zswap code and not stir the pot too much for the xarray
replacement. We can always make more adjustment once the RB tree is
gone.

> while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) {
>         WARN_ON(1);
>         zswap_duplicate_entry++;
>         zswap_invalidate_entry(tree, dupentry);
> }
>
> So I think we can do something like this in zswap_insert() instead:
>
> dupentry = xa_store(..);
> if (WARN_ON(dupentry)) {
>         zswap_duplicate_entry++;
>         zswap_invalidate_entry(tree, dupentry);
> }
>
> WDYT?
>
> >  }
> >
> >  static bool zswap_erase(struct zswap_tree *tree, struct zswap_entry *entry)
> >  {
> > +       struct zswap_entry *e;
> >         pgoff_t offset = swp_offset(entry->swpentry);
> > -       if (!RB_EMPTY_NODE(&entry->rbnode)) {
> > -               struct zswap_entry *old;
> > -               old = xa_erase(&tree->xarray, offset);
> > -               BUG_ON(old != entry);
> > -               rb_erase(&entry->rbnode, &tree->rbroot);
> > -               RB_CLEAR_NODE(&entry->rbnode);
> > -               return true;
> > -       }
> > -       return false;
> > +       XA_STATE(xas, &tree->xarray, offset);
> > +
> > +       do {
> > +               xas_lock_irq(&xas);
> > +               do {
> > +                       e = xas_load(&xas);
> > +               } while (xas_retry(&xas, e));
> > +               if (xas_valid(&xas) && e != entry) {
> > +                       xas_unlock_irq(&xas);
> > +                       return false;
> > +               }
> > +               xas_store(&xas, NULL);
> > +               xas_unlock_irq(&xas);
> > +       } while (xas_nomem(&xas, GFP_KERNEL));
> > +       return !xas_error(&xas);
> >  }
>
> Same here, I think we just want:
>
> return !!xa_erase(..);

For the erase case it is tricky.
The current zswap code does not erase an entry if the tree entry at
the same offset has been changed. It should be fine if the new entry
is NULL. Basically some race to remove the entry already. However, if
the entry is not NULL, then force resetting it to NULL will change
behavior compared to the current.

From reading the zswap code, I am not sure this is in theory
impossible to happen: Some one races to remove the zswap entry then
reclaim converts that page back to the zswap, with a new entry. By the
time the zswap_erase() tries to erase the current entry, the entry has
changed to a new entry. It seems the right thing to do in this
unlikely case is that we should skip the force erase and drop the
current entry, assuming someone else has already invalidated it. Don't
touch the entry in the tree, we assume it is a newer generation.

If we can reason the above is impossible to happen, then the force
erase and reset the entry to NULL should be fine(Famous last word).
Noted that this is a behavior change, I would like to seperate it out
from the drop in replacement patch(keep the original behavior)

Chris




>
> >
> >  static struct zpool *zswap_find_zpool(struct zswap_entry *entry)
> > @@ -583,7 +563,6 @@ static void zswap_entry_put(struct zswap_tree *tree,
> >
> >         WARN_ON_ONCE(refcount < 0);
> >         if (refcount == 0) {
> > -               WARN_ON_ONCE(!RB_EMPTY_NODE(&entry->rbnode));
> >                 zswap_free_entry(entry);
> >         }
>
> nit: the braces are no longer needed here
>


  parent reply	other threads:[~2024-01-19  5:43 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-18  3:05 [PATCH 0/2] RFC: zswap tree use xarray instead of " Chris Li
2024-01-18  3:05 ` [PATCH 1/2] mm: zswap.c: add xarray tree to zswap Chris Li
2024-01-18  6:20   ` Yosry Ahmed
2024-01-18 13:52     ` Matthew Wilcox
2024-01-18 16:59       ` Yosry Ahmed
2024-01-18 18:25         ` Matthew Wilcox
2024-01-19  5:28           ` Chris Li
2024-01-19 19:30             ` Yosry Ahmed
2024-01-19  5:24     ` Chris Li
2024-01-19 19:29       ` Yosry Ahmed
2024-01-19 20:04         ` Matthew Wilcox
2024-01-19 21:41           ` Yosry Ahmed
2024-01-19 22:05             ` Chris Li
2024-01-19 22:08               ` Yosry Ahmed
2024-01-18  3:05 ` [PATCH 2/2] mm: zswap.c: remove RB tree Chris Li
2024-01-18  6:35   ` Yosry Ahmed
2024-01-18 19:35     ` Yosry Ahmed
2024-01-19  5:49       ` Chris Li
2024-01-19 19:37         ` Yosry Ahmed
2024-01-19  5:43     ` Chris Li [this message]
2024-01-19 19:36       ` Yosry Ahmed
2024-01-19 21:31         ` Chris Li
2024-01-19 21:44           ` Yosry Ahmed
2024-01-18  6:01 ` [PATCH 0/2] RFC: zswap tree use xarray instead of " Yosry Ahmed
2024-01-18  6:39   ` Yosry Ahmed
2024-01-18  6:57     ` Chengming Zhou
2024-01-18  7:02       ` Yosry Ahmed
2024-01-18  7:19         ` Chris Li
2024-01-18  7:35           ` Chengming Zhou
2024-01-19  4:59             ` Chris Li
2024-01-19  6:18               ` Chengming Zhou
2024-01-19 10:26                 ` Chris Li
2024-01-19 11:12                   ` Chengming Zhou
2024-01-19 11:59                     ` Chris Li
2024-01-18  6:48   ` Christopher Li
2024-01-18  7:05     ` Yosry Ahmed
2024-01-18  7:28       ` Chris Li
2024-01-18 17:14         ` Yosry Ahmed
2024-01-18 14:48       ` Johannes Weiner
2024-01-18 18:59     ` Liam R. Howlett
2024-01-19  5:13       ` Chris Li
2024-01-18 18:01 ` Nhat Pham
2024-01-19  5:14   ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF8kJuMwF=s-28OPFbCfJf-f3jsfYmyiP6pSBjj8ZgkGmbT9ZA@mail.gmail.com' \
    --to=chrisl@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=ctshao@google.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hezhongkun.hzk@bytedance.com \
    --cc=joel@joelfernandes.org \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yosryahmed@google.com \
    --cc=yuzhao@google.com \
    --cc=zhouchengming@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox