Re: [PATCH v4] mm/swap: fix race when skipping swapcache

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Chris Li <chrisl@kernel.org>
To: Kairui Song <ryncsn@gmail.com>
Cc: Barry Song <21cnbao@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 linux-mm <linux-mm@kvack.org>,
	"Huang, Ying" <ying.huang@intel.com>,
	 Minchan Kim <minchan@kernel.org>,
	Barry Song <v-songbaohua@oppo.com>, Yu Zhao <yuzhao@google.com>,
	 SeongJae Park <sj@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	 Yosry Ahmed <yosryahmed@google.com>,
	Aaron Lu <aaron.lu@intel.com>,
	stable@vger.kernel.org,  LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4] mm/swap: fix race when skipping swapcache
Date: Tue, 27 Feb 2024 15:01:32 -0800	[thread overview]
Message-ID: <CAF8kJuOXj+Q5O=eLaRJOZrHDs7wygdTEfY3MfqGWQCmjEWQ_XQ@mail.gmail.com> (raw)
In-Reply-To: <CAMgjq7BHAk_6ktCruKq_Yc30n++yhUyKTqzQuJ9fPvGVNNSdVA@mail.gmail.com>

On Tue, Feb 27, 2024 at 10:14 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Wed, Feb 21, 2024 at 12:32 AM Chris Li <chrisl@kernel.org> wrote:
> >
> > On Mon, Feb 19, 2024 at 8:56 PM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > >
> > > Hi Barry,
> > >
> > > > it might not be a problem for throughput. but for real-time and tail latency,
> > > > this hurts. For example, this might increase dropping frames of UI which
> > > > is an important parameter to evaluate performance :-)
> > > >
> > >
> > > That's a true issue, as Chris mentioned before I think we need to
> > > think of some clever data struct to solve this more naturally in the
> > > future, similar issue exists for cached swapin as well and it has been
> > > there for a while. On the other hand I think maybe applications that
> > > are extremely latency sensitive should try to avoid swap on fault? A
> > > swapin could cause other issues like reclaim, throttled or contention
> > > with many other things, these seem to have a higher chance than this
> > > race.
> >
> >
> > Yes, I do think the best long term solution is to have some clever
> > data structure to solve the synchronization issue and allow racing
> > threads to make forward progress at the same time.
> >
> > I have also explored some (failed) synchronization ideas, for example
> > having the run time swap entry refcount separate from swap_map count.
> > BTW, zswap entry->refcount behaves like that, it is separate from swap
> > entry and manages the temporary run time usage count held by the
> > function. However that idea has its own problem as well, it needs to
> > have an xarray to track the swap entry run time refcount (only stored
> > in the xarray when CPU fails to get SWAP_HAS_CACHE bit.) When we are
> > done with page faults, we still need to look up the xarray to make
> > sure there is no racing CPU and put the refcount into the xarray. That
> >  kind of defeats the purpose of avoiding the swap cache in the first
> > place. We still need to do the xarray lookup in the normal path.
> >
> > I came to realize that, while this current fix is not perfect, (I
> > still wish we had a better solution not pausing the racing CPU). This
> > patch stands better than not fixing this data corruption issue and the
> > patch remains relatively simple. Yes it has latency issues but still
> > better than data corruption.  It also doesn't stop us from coming up
> > with better solutions later on. If we want to address the
> > synchronization in a way not blocking other CPUs, it will likely
> > require a much bigger change.
> >
> > Unless we have a better suggestion. It seems the better one among the
> > alternatives so far.
> >
>
> Hi,
>
> Thanks for the comments. I've been trying some ideas locally, I think a simple and straight solution exists: We just don't skip the swap cache xarray.

Yes, I have been pondering about that as well.

Notice in __read_swap_cache_async(), it has a similar
"schedule_timeout_uninterruptible(1)" when  swapcache_prepare(entry)
fails to grab the SWAP_HAS_CACHE bit. So falling back to use the swap
cache does not automatically solve the latency issue. Similar delay
exists in the swap cache case as well.

> The current reason we are skipping it is for performance, but with some optimization, the performance should be as good as skipping it (in current behavior). Notice even in the swap cache bypass path, we need to do one lookup, and one modify (delete the shadow). That can't be skipped. So the usage of swap cache can be better organized and optimized.

> After all swapin makes use of swap cache, swapin can insert the folio in swap cache xarray first, then set swap map cache bit. I'm thinking about reusing the folio lock, or having an intermediate value in xarray, so raced swapins can wait properly. There are some tricky parts syncing with swap maps though.

Inserting the swap cache xarray first and setting SWAP_HAS_CACHE bit
later will need more audit on the race. I assume you take the swap
device/cluster lock before folio insert into swap cache xarray?

Chris
>
> Currently working on a series, will send in a few weeks if it works.

     prev parent reply	other threads:[~2024-02-27 23:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-19  8:20 Kairui Song
2024-02-19  8:47 ` Huang, Ying
2024-02-19  9:02 ` David Hildenbrand
2024-02-19 16:37 ` Chris Li
2024-02-19 22:10 ` Barry Song
2024-02-20  1:09   ` Huang, Ying
2024-02-20  4:49   ` Chengming Zhou
2024-02-20  5:32     ` Kairui Song
2024-02-20  5:37       ` Chengming Zhou
2024-02-20  1:31 ` Andrew Morton
2024-02-20  3:42   ` Kairui Song
2024-02-20  4:01     ` Barry Song
2024-02-20  4:56       ` Kairui Song
2024-02-20 10:26         ` Barry Song
2024-02-20 16:32         ` Chris Li
2024-02-27 18:13           ` Kairui Song
2024-02-27 23:01             ` Chris Li [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF8kJuOXj+Q5O=eLaRJOZrHDs7wygdTEfY3MfqGWQCmjEWQ_XQ@mail.gmail.com' \
    --to=chrisl@kernel.org \
    --cc=21cnbao@gmail.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=ryncsn@gmail.com \
    --cc=sj@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=v-songbaohua@oppo.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yosryahmed@google.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox