From: Yosry Ahmed <yosryahmed@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>,
Nhat Pham <nphamcs@gmail.com>,
akpm@linux-foundation.org, sjenning@redhat.com,
ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org,
roman.gushchin@linux.dev, shakeelb@google.com,
muchun.song@linux.dev, linux-mm@kvack.org, kernel-team@meta.com,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
Chris Li <chrisl@kernel.org>
Subject: Re: [PATCH v2 1/2] zswap: make shrinking memcg-aware
Date: Wed, 27 Sep 2023 14:07:45 -0700 [thread overview]
Message-ID: <CAJD7tkbG0Rg-xM=wn8tnt=mXfty8-Y=6t0sbvtArrsnBPyH2cg@mail.gmail.com> (raw)
In-Reply-To: <20230927210206.GC399644@cmpxchg.org>
On Wed, Sep 27, 2023 at 2:02 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Sep 27, 2023 at 09:48:10PM +0200, Domenico Cerasuolo wrote:
> > > > @@ -485,6 +487,17 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
> > > > __folio_set_locked(folio);
> > > > __folio_set_swapbacked(folio);
> > > >
> > > > + /*
> > > > + * Page fault might itself trigger reclaim, on a zswap object that
> > > > + * corresponds to the same swap entry. However, as the swap entry has
> > > > + * previously been pinned, the task will run into an infinite loop trying
> > > > + * to pin the swap entry again.
> > > > + *
> > > > + * To prevent this from happening, we remove it from the zswap
> > > > + * LRU to prevent its reclamation.
> > > > + */
> > > > + zswap_lru_removed = zswap_remove_swpentry_from_lru(entry);
> > > > +
> > >
> > > This will add a zswap lookup (and potentially an insertion below) in
> > > every single swap fault path, right?. Doesn't this introduce latency
> > > regressions? I am also not a fan of having zswap-specific details in
> > > this path.
> > >
> > > When you say "pinned", do you mean the call to swapcache_prepare()
> > > above (i.e. setting SWAP_HAS_CACHE)? IIUC, the scenario you are
> > > worried about is that the following call to charge the page may invoke
> > > reclaim, go into zswap, and try to writeback the same page we are
> > > swapping in here. The writeback call will recurse into
> > > __read_swap_cache_async(), call swapcache_prepare() and get EEXIST,
> > > and keep looping indefinitely. Is this correct?
>
> Yeah, exactly.
>
> > > If yes, can we handle this by adding a flag to
> > > __read_swap_cache_async() that basically says "don't wait for
> > > SWAP_HAS_CACHE and the swapcache to be consistent, if
> > > swapcache_prepare() returns EEXIST just fail and return"? The zswap
> > > writeback path can pass in this flag and skip such pages. We might
> > > want to modify the writeback code to put back those pages at the end
> > > of the lru instead of in the beginning.
> >
> > Thanks for the suggestion, this actually works and it seems cleaner so I think
> > we'll go for your solution.
>
> That sounds like a great idea.
>
> It should be pointed out that these aren't perfectly
> equivalent. Removing the entry from the LRU eliminates the lock
> recursion scenario on that very specific entry.
>
> Having writeback skip on -EEXIST will make it skip *any* pages that
> are concurrently entering the swapcache, even when it *could* wait for
> them to finish.
>
> However, pages that are concurrently read back into memory are a poor
> choice for writeback anyway, and likely to be removed from swap soon.
>
> So it happens to work out just fine in this case. I'd just add a
> comment that explains the recursion deadlock, as well as the
> implication of skipping any busy entry and why that's okay.
Good point, we will indeed skip even if the concurrent insertion from
the swapcache is coming from a different cpu.
As you said, it works out just fine in this case, as the page will be
removed from zswap momentarily anyway. A comment is indeed due.
next prev parent reply other threads:[~2023-09-27 21:08 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-19 17:14 [PATCH v2 0/2] workload-specific and memory pressure-driven zswap writeback Nhat Pham
2023-09-19 17:14 ` [PATCH v2 1/2] zswap: make shrinking memcg-aware Nhat Pham
2023-09-25 20:17 ` Yosry Ahmed
2023-09-26 18:24 ` Johannes Weiner
2023-09-26 18:37 ` Yosry Ahmed
2023-09-27 20:39 ` Johannes Weiner
2023-09-27 19:48 ` Domenico Cerasuolo
2023-09-27 20:28 ` Yosry Ahmed
2023-09-27 21:02 ` Johannes Weiner
2023-09-27 21:07 ` Yosry Ahmed [this message]
2023-09-27 20:51 ` Johannes Weiner
2023-09-27 20:57 ` Yosry Ahmed
2023-10-17 17:44 ` Ryan Roberts
2023-10-17 17:56 ` Domenico Cerasuolo
2023-10-17 18:25 ` Ryan Roberts
2023-09-19 17:14 ` [PATCH v2 2/2] zswap: shrinks zswap pool based on memory pressure Nhat Pham
2023-09-25 20:37 ` Yosry Ahmed
2023-09-25 23:29 ` Nhat Pham
2023-09-25 23:59 ` Yosry Ahmed
2023-09-26 0:43 ` Nhat Pham
2023-09-26 1:11 ` Yosry Ahmed
2023-09-27 23:42 ` Nhat Pham
2023-09-19 19:31 ` [PATCH v2 0/2] workload-specific and memory pressure-driven zswap writeback Nhat Pham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJD7tkbG0Rg-xM=wn8tnt=mXfty8-Y=6t0sbvtArrsnBPyH2cg@mail.gmail.com' \
--to=yosryahmed@google.com \
--cc=akpm@linux-foundation.org \
--cc=cerasuolodomenico@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=chrisl@kernel.org \
--cc=ddstreet@ieee.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=sjenning@redhat.com \
--cc=vitaly.wool@konsulko.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox