From: Chris Li <chrisl@kernel.org>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
"Wei Xu" <weixugc@google.com>, "Yu Zhao" <yuzhao@google.com>,
"Greg Thelen" <gthelen@google.com>,
"Chun-Tse Shao" <ctshao@google.com>,
"Suren Baghdasaryan" <surenb@google.com>,
"Brain Geffon" <bgeffon@google.com>,
"Minchan Kim" <minchan@kernel.org>,
"Michal Hocko" <mhocko@suse.com>,
"Mel Gorman" <mgorman@techsingularity.net>,
"Huang Ying" <ying.huang@intel.com>,
"Nhat Pham" <nphamcs@gmail.com>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Kairui Song" <kasong@tencent.com>,
"Zhongkun He" <hezhongkun.hzk@bytedance.com>,
"Kemeng Shi" <shikemeng@huaweicloud.com>,
"Barry Song" <v-songbaohua@oppo.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
"Joel Fernandes" <joel@joelfernandes.org>,
"Chengming Zhou" <zhouchengming@bytedance.com>
Subject: Re: [PATCH 2/2] mm: zswap.c: remove RB tree
Date: Fri, 19 Jan 2024 13:31:46 -0800 [thread overview]
Message-ID: <CAF8kJuOOwyzGGz21PKsGBdJBU3B+gk1+o78MwGqG5FDNZK9LSA@mail.gmail.com> (raw)
In-Reply-To: <CAJD7tkbdT-Bdi4YkOS_Px73dkhthq2qn6Yvy7CobzTBji0WMog@mail.gmail.com>
On Fri, Jan 19, 2024 at 11:37 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> > > I think using the xas_* APIs can be avoided here. The only reason we
> > > need it is that we want to check if there's an existing entry first,
> > > and return -EEXIST. However, in that case, the caller will replace it
> > > anyway (and do some operations on the dupentry):
> >
> > We might be able to for the insert case if we don't mind changing the
> > code behavior a bit. My original intent is to keep close to the
> > original zswap code and not stir the pot too much for the xarray
> > replacement. We can always make more adjustment once the RB tree is
> > gone.
>
> I don't see how this changes code behavior though. The current code in
> zswap_store() will do the following:
I am referring to the log and update counter happening after the zswap
mapping was updated. Maybe nobody actually cares about that behavior
difference. In my mind, there is a difference.
>
> - Hold the tree lock to make sure no one else modifies it.
> - Try to insert, check if there is already a dupentry at the index and
> return -EEXIST.
> - Warn, increment zswap_duplicate_entry, and invalidate the dupentry.
> - Try to insert again (this should always succeed since we are holding
> the lock).
>
> What I am proposing is:
> - zswap_xa_insert() is a thin wrapper around xa_store() (or we can
> remove it completely).
> - zswap_store() does the following:
> - Use zswap_xa_insert() and check if there is a returned dupentry.
> - Warn, increment zswap_duplicate_entry, and invalidate the dupentry.
>
> Either way, we always place the entry we have in the tree, and if
> there is a dupentry we warn and invalidate it. If anything, the latter
> is more straightforward.
>
> Am I missing something?
No, that is what I would do if I have to use the xa_* API.
>
> > >
> > > > }
> > > >
> > > > static bool zswap_erase(struct zswap_tree *tree, struct zswap_entry *entry)
> > > > {
> > > > + struct zswap_entry *e;
> > > > pgoff_t offset = swp_offset(entry->swpentry);
> > > > - if (!RB_EMPTY_NODE(&entry->rbnode)) {
> > > > - struct zswap_entry *old;
> > > > - old = xa_erase(&tree->xarray, offset);
> > > > - BUG_ON(old != entry);
> > > > - rb_erase(&entry->rbnode, &tree->rbroot);
> > > > - RB_CLEAR_NODE(&entry->rbnode);
> > > > - return true;
> > > > - }
> > > > - return false;
> > > > + XA_STATE(xas, &tree->xarray, offset);
> > > > +
> > > > + do {
> > > > + xas_lock_irq(&xas);
> > > > + do {
> > > > + e = xas_load(&xas);
> > > > + } while (xas_retry(&xas, e));
> > > > + if (xas_valid(&xas) && e != entry) {
> > > > + xas_unlock_irq(&xas);
> > > > + return false;
> > > > + }
> > > > + xas_store(&xas, NULL);
> > > > + xas_unlock_irq(&xas);
> > > > + } while (xas_nomem(&xas, GFP_KERNEL));
> > > > + return !xas_error(&xas);
> > > > }
> > >
> > > Same here, I think we just want:
> > >
> > > return !!xa_erase(..);
> >
> > For the erase case it is tricky.
> > The current zswap code does not erase an entry if the tree entry at
> > the same offset has been changed. It should be fine if the new entry
> > is NULL. Basically some race to remove the entry already. However, if
> > the entry is not NULL, then force resetting it to NULL will change
> > behavior compared to the current.
>
> I see, very good point. I think we can use xa_cmpxchg() and pass in NULL?
>
That is certainly possible. Thanks for bringing it up.
Let me try to combine the tree->lock with xarray lock first. If
xa_cmpxchg() can simplify the result there, I will use it.
> Handling large folios in zswap is a much larger topic that involves a
> lot more than this xa_* vs. xas_* apis dispute. Let's not worry about
> this for now.
Ack. One more reason to use the XAS interface is that zswap currently
does multiple lookups on typical zswap_load(). It finds entries by
offset, for the entry (lookup one). Then after folio install to swap
cache, it deletes the entry, it will performan another lookup to
delete the entry (look up two). Using XAS might be able to cache the
node location for the second lookup to avoid the full node walk. That
is not in my current patch and can be a later improvement patch as
well.
Chris
next prev parent reply other threads:[~2024-01-19 21:32 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-18 3:05 [PATCH 0/2] RFC: zswap tree use xarray instead of " Chris Li
2024-01-18 3:05 ` [PATCH 1/2] mm: zswap.c: add xarray tree to zswap Chris Li
2024-01-18 6:20 ` Yosry Ahmed
2024-01-18 13:52 ` Matthew Wilcox
2024-01-18 16:59 ` Yosry Ahmed
2024-01-18 18:25 ` Matthew Wilcox
2024-01-19 5:28 ` Chris Li
2024-01-19 19:30 ` Yosry Ahmed
2024-01-19 5:24 ` Chris Li
2024-01-19 19:29 ` Yosry Ahmed
2024-01-19 20:04 ` Matthew Wilcox
2024-01-19 21:41 ` Yosry Ahmed
2024-01-19 22:05 ` Chris Li
2024-01-19 22:08 ` Yosry Ahmed
2024-01-18 3:05 ` [PATCH 2/2] mm: zswap.c: remove RB tree Chris Li
2024-01-18 6:35 ` Yosry Ahmed
2024-01-18 19:35 ` Yosry Ahmed
2024-01-19 5:49 ` Chris Li
2024-01-19 19:37 ` Yosry Ahmed
2024-01-19 5:43 ` Chris Li
2024-01-19 19:36 ` Yosry Ahmed
2024-01-19 21:31 ` Chris Li [this message]
2024-01-19 21:44 ` Yosry Ahmed
2024-01-18 6:01 ` [PATCH 0/2] RFC: zswap tree use xarray instead of " Yosry Ahmed
2024-01-18 6:39 ` Yosry Ahmed
2024-01-18 6:57 ` Chengming Zhou
2024-01-18 7:02 ` Yosry Ahmed
2024-01-18 7:19 ` Chris Li
2024-01-18 7:35 ` Chengming Zhou
2024-01-19 4:59 ` Chris Li
2024-01-19 6:18 ` Chengming Zhou
2024-01-19 10:26 ` Chris Li
2024-01-19 11:12 ` Chengming Zhou
2024-01-19 11:59 ` Chris Li
2024-01-18 6:48 ` Christopher Li
2024-01-18 7:05 ` Yosry Ahmed
2024-01-18 7:28 ` Chris Li
2024-01-18 17:14 ` Yosry Ahmed
2024-01-18 14:48 ` Johannes Weiner
2024-01-18 18:59 ` Liam R. Howlett
2024-01-19 5:13 ` Chris Li
2024-01-18 18:01 ` Nhat Pham
2024-01-19 5:14 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAF8kJuOOwyzGGz21PKsGBdJBU3B+gk1+o78MwGqG5FDNZK9LSA@mail.gmail.com \
--to=chrisl@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bgeffon@google.com \
--cc=ctshao@google.com \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=hezhongkun.hzk@bytedance.com \
--cc=joel@joelfernandes.org \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=nphamcs@gmail.com \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=yosryahmed@google.com \
--cc=yuzhao@google.com \
--cc=zhouchengming@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox