Re: [RFC PATCH 10/10] mm/swap: optimize synchronous swapin

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Huang, Ying" <ying.huang@intel.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: linux-mm@kvack.org,  Chris Li <chrisl@kernel.org>,
	 Minchan Kim <minchan@kernel.org>,
	 Barry Song <v-songbaohua@oppo.com>,
	 Ryan Roberts <ryan.roberts@arm.com>,
	 Yu Zhao <yuzhao@google.com>,  SeongJae Park <sj@kernel.org>,
	 David Hildenbrand <david@redhat.com>,
	 Yosry Ahmed <yosryahmed@google.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	 Matthew Wilcox <willy@infradead.org>,
	 Nhat Pham <nphamcs@gmail.com>,
	 Chengming Zhou <zhouchengming@bytedance.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 10/10] mm/swap: optimize synchronous swapin
Date: Wed, 27 Mar 2024 16:16:30 +0800	[thread overview]
Message-ID: <87il18m6n5.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <CAMgjq7D9-6JXOzpd18t8MSBAotHgEG2YZbi4efNkJiwiSJyJmw@mail.gmail.com> (Kairui Song's message of "Wed, 27 Mar 2024 15:14:03 +0800")

Kairui Song <ryncsn@gmail.com> writes:

> On Wed, Mar 27, 2024 at 2:49 PM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> Kairui Song <ryncsn@gmail.com> writes:
>>
>> > On Wed, Mar 27, 2024 at 2:24 PM Huang, Ying <ying.huang@intel.com> wrote:
>> >>
>> >> Kairui Song <ryncsn@gmail.com> writes:
>> >>
>> >> > From: Kairui Song <kasong@tencent.com>
>> >> >
>> >> > Interestingly the major performance overhead of synchronous is actually
>> >> > from the workingset nodes update, that's because synchronous swap in
>> >>
>> >> If it's the major overhead, why not make it the first optimization?
>> >
>> > This performance issue became much more obvious after doing other
>> > optimizations, and other optimizations are for general swapin not only
>> > for synchronous swapin, that's also how I optimized things step by
>> > step, so I kept my patch order...
>> >
>> > And it is easier to do this after Patch 8/10 which introduces the new
>> > interface for swap cache.
>> >
>> >>
>> >> > keeps adding single folios into a xa_node, making the node no longer
>> >> > a shadow node and have to be removed from shadow_nodes, then remove
>> >> > the folio very shortly and making the node a shadow node again,
>> >> > so it has to add back to the shadow_nodes.
>> >>
>> >> The folio is removed only if should_try_to_free_swap() returns true?
>> >>
>> >> > Mark synchronous swapin folio with a special bit in swap entry embedded
>> >> > in folio->swap, as we still have some usable bits there. Skip workingset
>> >> > node update on insertion of such folio because it will be removed very
>> >> > quickly, and will trigger the update ensuring the workingset info is
>> >> > eventual consensus.
>> >>
>> >> Is this safe?  Is it possible for the shadow node to be reclaimed after
>> >> the folio are added into node and before being removed?
>> >
>> > If a xa node contains any non-shadow entry, it can't be reclaimed,
>> > shadow_lru_isolate will check and skip such nodes in case of race.
>>
>> In shadow_lru_isolate(),
>>
>>         /*
>>          * The nodes should only contain one or more shadow entries,
>>          * no pages, so we expect to be able to remove them all and
>>          * delete and free the empty node afterwards.
>>          */
>>         if (WARN_ON_ONCE(!node->nr_values))
>>                 goto out_invalid;
>>         if (WARN_ON_ONCE(node->count != node->nr_values))
>>                 goto out_invalid;
>>
>> So, this isn't considered normal and will cause warning now.
>
> Yes, I added an exception in this patch:
> -       if (WARN_ON_ONCE(node->count != node->nr_values))
> +       if (WARN_ON_ONCE(node->count != node->nr_values &&
> mapping->host != NULL))
>
> The code is not a good final solution, but the idea might not be that
> bad, list_lru provides many operations like LRU_ROTATE, we can even
> lazy remove all the nodes as a general optimization, or add a
> threshold for adding/removing a node from LRU.

We can compare different solutions.  For this one, we still need to deal
with the cases where the folio isn't removed from the swap cache, that
is, should_try_to_free_swap() returns false.

>>
>> >>
>> >> If so, we may consider some other methods.  Make shadow_nodes per-cpu?
>> >
>> > That's also an alternative solution if there are other risks.
>>
>> This appears a general optimization and more clean.
>
> I'm not sure if synchronization between CPUs will make more burden,
> because shadow nodes are globally shared, one node can be referenced
> by multiple CPUs, I can have a try to see if this is doable. Maybe a
> per-cpu batch is better but synchronization might still be an issue.

Yes.  Per-CPU shadow_nodes needs to find list from shadow node.  That
has some overhead.

If lock contention on list_lru lock is the root cause, we can use hashed
shadow node lists.  That can reduce lock contention effectively.

--
Best Regards,
Huang, Ying

next prev parent reply	other threads:[~2024-03-27  8:18 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-26 18:50 [RFC PATCH 00/10] mm/swap: always use swap cache for synchronization Kairui Song
2024-03-26 18:50 ` [RFC PATCH 01/10] mm/filemap: split filemap storing logic into a standalone helper Kairui Song
2024-03-26 18:50 ` [RFC PATCH 02/10] mm/swap: move no readahead swapin code to a stand-alone helper Kairui Song
2024-03-26 18:50 ` [RFC PATCH 03/10] mm/swap: convert swapin_readahead to return a folio Kairui Song
2024-03-26 20:03   ` Matthew Wilcox
2024-03-26 18:50 ` [RFC PATCH 04/10] mm/swap: remove cache bypass swapin Kairui Song
2024-03-27  6:30   ` Huang, Ying
2024-03-27  6:55     ` Kairui Song
2024-03-27  7:29       ` Huang, Ying
2024-03-26 18:50 ` [RFC PATCH 05/10] mm/swap: clean shadow only in unmap path Kairui Song
2024-03-26 18:50 ` [RFC PATCH 06/10] mm/swap: switch to use multi index entries Kairui Song
2024-03-26 18:50 ` [RFC PATCH 07/10] mm/swap: rename __read_swap_cache_async to swap_cache_alloc_or_get Kairui Song
2024-03-26 18:50 ` [RFC PATCH 08/10] mm/swap: use swap cache as a synchronization layer Kairui Song
2024-03-26 18:50 ` [RFC PATCH 09/10] mm/swap: delay the swap cache lookup for swapin Kairui Song
2024-03-26 18:50 ` [RFC PATCH 10/10] mm/swap: optimize synchronous swapin Kairui Song
2024-03-27  6:22   ` Huang, Ying
2024-03-27  6:37     ` Kairui Song
2024-03-27  6:47       ` Huang, Ying
2024-03-27  7:14         ` Kairui Song
2024-03-27  8:16           ` Huang, Ying [this message]
2024-03-27  8:08   ` Barry Song
2024-03-27  8:44     ` Kairui Song
2024-03-27  2:52 ` [RFC PATCH 00/10] mm/swap: always use swap cache for synchronization Huang, Ying
2024-03-27  3:01   ` Kairui Song
2024-03-27  8:27     ` Ryan Roberts
2024-03-27  8:32       ` Huang, Ying
2024-03-27  9:39         ` Ryan Roberts
2024-03-27 11:04       ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87il18m6n5.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=ryncsn@gmail.com \
    --cc=sj@kernel.org \
    --cc=v-songbaohua@oppo.com \
    --cc=willy@infradead.org \
    --cc=yosryahmed@google.com \
    --cc=yuzhao@google.com \
    --cc=zhouchengming@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox