From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>, Kairui Song <kasong@tencent.com>,
Miaohe Lin <linmiaohe@huawei.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry()
Date: Thu, 20 Feb 2025 17:27:19 +0800 [thread overview]
Message-ID: <2e4b9927-562d-4cfa-9362-f23e3bcfc454@linux.alibaba.com> (raw)
In-Reply-To: <47d189c7-3143-4b59-a3af-477d4c46a8a0@linux.alibaba.com>
On 2025/2/20 17:07, Baolin Wang wrote:
>
>
> On 2025/2/20 00:10, Zi Yan wrote:
>> On 19 Feb 2025, at 5:04, Baolin Wang wrote:
>>
>>> Hi Zi,
>>>
>>> Sorry for the late reply due to being busy with other things:)
>>
>> Thank you for taking a look at the patches. :)
>>
>>>
>>> On 2025/2/19 07:54, Zi Yan wrote:
>>>> During shmem_split_large_entry(), large swap entries are covering n
>>>> slots
>>>> and an order-0 folio needs to be inserted.
>>>>
>>>> Instead of splitting all n slots, only the 1 slot covered by the folio
>>>> need to be split and the remaining n-1 shadow entries can be
>>>> retained with
>>>> orders ranging from 0 to n-1. This method only requires
>>>> (n/XA_CHUNK_SHIFT) new xa_nodes instead of (n % XA_CHUNK_SHIFT) *
>>>> (n/XA_CHUNK_SHIFT) new xa_nodes, compared to the original
>>>> xas_split_alloc() + xas_split() one.
>>>>
>>>> For example, to split an order-9 large swap entry (assuming
>>>> XA_CHUNK_SHIFT
>>>> is 6), 1 xa_node is needed instead of 8.
>>>>
>>>> xas_try_split_min_order() is used to reduce the number of calls to
>>>> xas_try_split() during split.
>>>
>>> For shmem swapin, if we cannot swap in the whole large folio by
>>> skipping the swap cache, we will split the large swap entry stored in
>>> the shmem mapping into order-0 swap entries, rather than splitting it
>>> into other orders of swap entries. This is because the next time we
>>> swap in a shmem folio through shmem_swapin_cluster(), it will still
>>> be an order 0 folio.
>>
>> Right. But the swapin is one folio at a time, right?
>> shmem_split_large_entry()
>
> Yes, now we always swapin an order-0 folio from the async swap device at
> a time. However, for sync swap device, we will skip the swapcache and
> swapin the whole large folio by commit 1dd44c0af4fa, so it will not call
> shmem_split_large_entry() in this case.
>
>> should split the large swap entry and give you a slot to store the
>> order-0 folio.
>> For example, with an order-9 large swap entry, to swap in first
>> order-0 folio,
>> the large swap entry will become order-0, order-0, order-1, order-2,…
>> order-8,
>> after the split. Then the first order-0 swap entry can be used.
>> Then, when a second order-0 is swapped in, the second order-0 can be
>> used.
>> When the last order-0 is swapped in, the order-8 would be split to
>> order-7,order-6,…,order-1,order-0, order-0, and the last order-0 will
>> be used.
>
> Yes, understood. However, for the sequential swapin scenarios, where
> originally only one split operation is needed. However, your approach
> increases the number of split operations. Of course, I understand that
> in non-sequential swapin scenarios, your patch will save some xarray
> memory. It might be necessary to evaluate whether the increased split
> operations will have a significant impact on the performance of
> sequential swapin?
>
>> Maybe the swapin assumes after shmem_split_large_entry(), all swap
>> entries
>> are order-0, which can lead to issues. There should be some check like
>> if the swap entry order > folio_order, shmem_split_large_entry() should
>> be used.
>>>
>>> Moreover I did a quick test with swapping in order 6 shmem folios,
>>> however, my test hung, and the console was continuously filled with
>>> the following information. It seems there are some issues with shmem
>>> swapin handling. Anyway, I need more time to debug and test.
>> To swap in order-6 folios, shmem_split_large_entry() does not allocate
>> any new xa_node, since XA_CHUNK_SHIFT is 6. It is weird to see OOM
>> error below. Let me know if there is anything I can help.
>
> I encountered some issues while testing order 4 and order 6 swapin with
> your patches. And I roughly reviewed the patch, and it seems that the
> new swap entry stored in the shmem mapping was not correctly updated
> after the split.
>
> The following logic is to reset the swap entry after split, and I assume
> that the large swap entry is always split to order 0 before. As your
> patch suggests, if a non-uniform split is used, then the logic for
> resetting the swap entry needs to be changed? Please correct me if I
> missed something.
>
> /*
> * Re-set the swap entry after splitting, and the swap
> * offset of the original large entry must be continuous.
> */
> for (i = 0; i < 1 << order; i++) {
> pgoff_t aligned_index = round_down(index, 1 << order);
> swp_entry_t tmp;
>
> tmp = swp_entry(swp_type(swap), swp_offset(swap) + i);
> __xa_store(&mapping->i_pages, aligned_index + i,
> swp_to_radix_entry(tmp), 0);
> }
In addition, after your patch, the shmem_split_large_entry() seems
always return 0 even though it splits a large swap entry, but we still
need re-calculate the swap entry value after splitting, otherwise it may
return errors due to shmem_confirm_swap() validation failure.
/*
* If the large swap entry has already been split, it is
* necessary to recalculate the new swap entry based on
* the old order alignment.
*/
if (split_order > 0) {
pgoff_t offset = index - round_down(index, 1 << split_order);
swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
}
next prev parent reply other threads:[~2025-02-20 9:27 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-18 23:54 [PATCH v2 0/2] Minimize xa_node allocation during xarry split Zi Yan
2025-02-18 23:54 ` [PATCH v2 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() Zi Yan
2025-02-18 23:54 ` [PATCH v2 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() Zi Yan
2025-02-19 10:04 ` Baolin Wang
2025-02-19 16:10 ` Zi Yan
2025-02-20 9:07 ` Baolin Wang
2025-02-20 9:27 ` Baolin Wang [this message]
2025-02-20 13:06 ` Zi Yan
2025-02-21 2:33 ` Zi Yan
2025-02-21 2:38 ` Zi Yan
2025-02-21 6:17 ` Baolin Wang
2025-02-21 23:47 ` Zi Yan
2025-02-25 9:25 ` Baolin Wang
2025-02-25 9:20 ` Baolin Wang
2025-02-25 10:15 ` Baolin Wang
2025-02-25 16:41 ` Zi Yan
2025-02-25 20:32 ` Zi Yan
2025-02-26 6:37 ` Baolin Wang
2025-02-26 15:03 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2e4b9927-562d-4cfa-9362-f23e3bcfc454@linux.alibaba.com \
--to=baolin.wang@linux.alibaba.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=kasong@tencent.com \
--cc=linmiaohe@huawei.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox