From: zhiguojiang <justinjiang@vivo.com>
To: Barry Song <21cnbao@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Chris Li <chrisl@kernel.org>,
opensource.kernel@vivo.com
Subject: Re: [PATCH] mm: swap: mTHP frees entries as a whole
Date: Tue, 6 Aug 2024 15:40:52 +0800 [thread overview]
Message-ID: <dee6bf7c-ae73-435b-a6d5-ae966dfec048@vivo.com> (raw)
In-Reply-To: <CAGsJ_4zNd5oCG1vpWRJxOQgPRvyO3AbjGM5nt9SxGjm=YTcrdg@mail.gmail.com>
在 2024/8/6 10:07, Barry Song 写道:
> On Tue, Aug 6, 2024 at 2:01 PM zhiguojiang <justinjiang@vivo.com> wrote:
>>
>>
>> 在 2024/8/6 6:09, Barry Song 写道:
>>> On Tue, Aug 6, 2024 at 4:08 AM Zhiguo Jiang <justinjiang@vivo.com> wrote:
>>>> Support mTHP's attempt to free swap entries as a whole, which can avoid
>>>> frequent swap_info locking for every individual entry in
>>>> swapcache_free_entries(). When the swap_map count values corresponding
>>>> to all contiguous entries are all zero excluding SWAP_HAS_CACHE, the
>>>> entries will be freed directly by skippping percpu swp_slots caches.
>>>>
>>> No, this isn't quite good. Please review the work done by Chris and Kairui[1];
>>> they have handled it better. On a different note, I have a patch that can
>>> handle zap_pte_range() for swap entries in batches[2][3].
>> I'm glad to see your optimized submission about batch freeing swap
>> entries for
>> zap_pte_range(), sorry, I didn't see it before. My this patch can be
>> ignored.
> no worries, please help test and review the formal patch I sent:
> https://lore.kernel.org/linux-mm/20240806012409.61962-1-21cnbao@gmail.com/
I believe it's ok and valuable. Looking forward to being merged soon.
>
> Please note that I didn't use a bitmap to avoid a large stack, and
> there is a real possibility of the below can occur, your patch can
> crash if the below is true:
> nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER
>
> Additionally, I quickly skip the case where
> swap_count(data_race(si->swap_map[start_offset]) != 1) to avoid regressions
> in cases that can't be batched.
>
>> Thanks
>> Zhiguo
>>
>>> [1] https://lore.kernel.org/linux-mm/20240730-swap-allocator-v5-5-cb9c148b9297@kernel.org/
>>> [2] https://lore.kernel.org/linux-mm/20240803091118.84274-1-21cnbao@gmail.com/
>>> [3] https://lore.kernel.org/linux-mm/CAGsJ_4wPnQqKOHx6iQcwO8bQzoBXKr2qY2AgSxMwTQCj3-8YWw@mail.gmail.com/
>>>
>>>> Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
>>>> ---
>>>> mm/swapfile.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 61 insertions(+)
>>>>
>>>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>>>> index ea023fc25d08..829fb4cfb6ec
>>>> --- a/mm/swapfile.c
>>>> +++ b/mm/swapfile.c
>>>> @@ -1493,6 +1493,58 @@ static void swap_entry_range_free(struct swap_info_struct *p, swp_entry_t entry,
>>>> swap_range_free(p, offset, nr_pages);
>>>> }
>>>>
>>>> +/*
>>>> + * Free the contiguous swap entries as a whole, caller have to
>>>> + * ensure all entries belong to the same folio.
>>>> + */
>>>> +static void swap_entry_range_check_and_free(struct swap_info_struct *p,
>>>> + swp_entry_t entry, int nr, bool *any_only_cache)
>>>> +{
>>>> + const unsigned long start_offset = swp_offset(entry);
>>>> + const unsigned long end_offset = start_offset + nr;
>>>> + unsigned long offset;
>>>> + DECLARE_BITMAP(to_free, SWAPFILE_CLUSTER) = { 0 };
>>>> + struct swap_cluster_info *ci;
>>>> + int i = 0, nr_setbits = 0;
>>>> + unsigned char count;
>>>> +
>>>> + /*
>>>> + * Free and check swap_map count values corresponding to all contiguous
>>>> + * entries in the whole folio range.
>>>> + */
>>>> + WARN_ON_ONCE(nr > SWAPFILE_CLUSTER);
>>>> + ci = lock_cluster_or_swap_info(p, start_offset);
>>>> + for (offset = start_offset; offset < end_offset; offset++, i++) {
>>>> + if (data_race(p->swap_map[offset])) {
>>>> + count = __swap_entry_free_locked(p, offset, 1);
>>>> + if (!count) {
>>>> + bitmap_set(to_free, i, 1);
>>>> + nr_setbits++;
>>>> + } else if (count == SWAP_HAS_CACHE) {
>>>> + *any_only_cache = true;
>>>> + }
>>>> + } else {
>>>> + WARN_ON_ONCE(1);
>>>> + }
>>>> + }
>>>> + unlock_cluster_or_swap_info(p, ci);
>>>> +
>>>> + /*
>>>> + * If the swap_map count values corresponding to all contiguous entries are
>>>> + * all zero excluding SWAP_HAS_CACHE, the entries will be freed directly by
>>>> + * skippping percpu swp_slots caches, which can avoid frequent swap_info
>>>> + * locking for every individual entry.
>>>> + */
>>>> + if (nr > 1 && nr_setbits == nr) {
>>>> + spin_lock(&p->lock);
>>>> + swap_entry_range_free(p, entry, nr);
>>>> + spin_unlock(&p->lock);
>>>> + } else {
>>>> + for_each_set_bit(i, to_free, SWAPFILE_CLUSTER)
>>>> + free_swap_slot(swp_entry(p->type, start_offset + i));
>>>> + }
>>>> +}
>>>> +
>>>> static void cluster_swap_free_nr(struct swap_info_struct *sis,
>>>> unsigned long offset, int nr_pages,
>>>> unsigned char usage)
>>>> @@ -1808,6 +1860,14 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr)
>>>> if (WARN_ON(end_offset > si->max))
>>>> goto out;
>>>>
>>>> + /*
>>>> + * Try to free all contiguous entries about mTHP as a whole.
>>>> + */
>>>> + if (IS_ENABLED(CONFIG_THP_SWAP) && nr > 1) {
>>>> + swap_entry_range_check_and_free(si, entry, nr, &any_only_cache);
>>>> + goto free_cache;
>>>> + }
>>>> +
>>>> /*
>>>> * First free all entries in the range.
>>>> */
>>>> @@ -1821,6 +1881,7 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr)
>>>> }
>>>> }
>>>>
>>>> +free_cache:
>>>> /*
>>>> * Short-circuit the below loop if none of the entries had their
>>>> * reference drop to zero.
>>>> --
>>>> 2.39.0
>>>>
> Thanks
> Barry
Thanks
Zhiguo
next prev parent reply other threads:[~2024-08-06 7:41 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-05 16:07 Zhiguo Jiang
2024-08-05 22:09 ` Barry Song
2024-08-06 2:01 ` zhiguojiang
2024-08-06 2:07 ` Barry Song
2024-08-06 7:40 ` zhiguojiang [this message]
2024-08-06 6:48 ` Barry Song
2024-08-06 8:12 ` zhiguojiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dee6bf7c-ae73-435b-a6d5-ae966dfec048@vivo.com \
--to=justinjiang@vivo.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=chrisl@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=opensource.kernel@vivo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox