linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Yosry Ahmed <yosryahmed@google.com>,
	Nhat Pham <nphamcs@gmail.com>,
	akpm@linux-foundation.org, sjenning@redhat.com,
	ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org,
	roman.gushchin@linux.dev, shakeelb@google.com,
	muchun.song@linux.dev, linux-mm@kvack.org, kernel-team@meta.com,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	Chris Li <chrisl@kernel.org>
Subject: Re: [PATCH v2 1/2] zswap: make shrinking memcg-aware
Date: Wed, 27 Sep 2023 17:02:06 -0400	[thread overview]
Message-ID: <20230927210206.GC399644@cmpxchg.org> (raw)
In-Reply-To: <CA+CLi1httFOg4OM-0Hu3+fOvya4kpacCqN7A0upqOt4-YJiECg@mail.gmail.com>

On Wed, Sep 27, 2023 at 09:48:10PM +0200, Domenico Cerasuolo wrote:
> > > @@ -485,6 +487,17 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
> > >         __folio_set_locked(folio);
> > >         __folio_set_swapbacked(folio);
> > >
> > > +       /*
> > > +        * Page fault might itself trigger reclaim, on a zswap object that
> > > +        * corresponds to the same swap entry. However, as the swap entry has
> > > +        * previously been pinned, the task will run into an infinite loop trying
> > > +        * to pin the swap entry again.
> > > +        *
> > > +        * To prevent this from happening, we remove it from the zswap
> > > +        * LRU to prevent its reclamation.
> > > +        */
> > > +       zswap_lru_removed = zswap_remove_swpentry_from_lru(entry);
> > > +
> >
> > This will add a zswap lookup (and potentially an insertion below) in
> > every single swap fault path, right?. Doesn't this introduce latency
> > regressions? I am also not a fan of having zswap-specific details in
> > this path.
> >
> > When you say "pinned", do you mean the call to swapcache_prepare()
> > above (i.e. setting SWAP_HAS_CACHE)? IIUC, the scenario you are
> > worried about is that the following call to charge the page may invoke
> > reclaim, go into zswap, and try to writeback the same page we are
> > swapping in here. The writeback call will recurse into
> > __read_swap_cache_async(), call swapcache_prepare() and get EEXIST,
> > and keep looping indefinitely. Is this correct?

Yeah, exactly.

> > If yes, can we handle this by adding a flag to
> > __read_swap_cache_async() that basically says "don't wait for
> > SWAP_HAS_CACHE and the swapcache to be consistent, if
> > swapcache_prepare() returns EEXIST just fail and return"? The zswap
> > writeback path can pass in this flag and skip such pages. We might
> > want to modify the writeback code to put back those pages at the end
> > of the lru instead of in the beginning.
> 
> Thanks for the suggestion, this actually works and it seems cleaner so I think
> we'll go for your solution.

That sounds like a great idea.

It should be pointed out that these aren't perfectly
equivalent. Removing the entry from the LRU eliminates the lock
recursion scenario on that very specific entry.

Having writeback skip on -EEXIST will make it skip *any* pages that
are concurrently entering the swapcache, even when it *could* wait for
them to finish.

However, pages that are concurrently read back into memory are a poor
choice for writeback anyway, and likely to be removed from swap soon.

So it happens to work out just fine in this case. I'd just add a
comment that explains the recursion deadlock, as well as the
implication of skipping any busy entry and why that's okay.


  parent reply	other threads:[~2023-09-27 21:02 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-19 17:14 [PATCH v2 0/2] workload-specific and memory pressure-driven zswap writeback Nhat Pham
2023-09-19 17:14 ` [PATCH v2 1/2] zswap: make shrinking memcg-aware Nhat Pham
2023-09-25 20:17   ` Yosry Ahmed
2023-09-26 18:24     ` Johannes Weiner
2023-09-26 18:37       ` Yosry Ahmed
2023-09-27 20:39         ` Johannes Weiner
2023-09-27 19:48     ` Domenico Cerasuolo
2023-09-27 20:28       ` Yosry Ahmed
2023-09-27 21:02       ` Johannes Weiner [this message]
2023-09-27 21:07         ` Yosry Ahmed
2023-09-27 20:51     ` Johannes Weiner
2023-09-27 20:57       ` Yosry Ahmed
2023-10-17 17:44   ` Ryan Roberts
2023-10-17 17:56     ` Domenico Cerasuolo
2023-10-17 18:25       ` Ryan Roberts
2023-09-19 17:14 ` [PATCH v2 2/2] zswap: shrinks zswap pool based on memory pressure Nhat Pham
2023-09-25 20:37   ` Yosry Ahmed
2023-09-25 23:29     ` Nhat Pham
2023-09-25 23:59       ` Yosry Ahmed
2023-09-26  0:43         ` Nhat Pham
2023-09-26  1:11           ` Yosry Ahmed
2023-09-27 23:42             ` Nhat Pham
2023-09-19 19:31 ` [PATCH v2 0/2] workload-specific and memory pressure-driven zswap writeback Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230927210206.GC399644@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=cerasuolodomenico@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chrisl@kernel.org \
    --cc=ddstreet@ieee.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox