Re: [PATCH v2 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>, Kairui Song <kasong@tencent.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry()
Date: Thu, 20 Feb 2025 17:27:19 +0800	[thread overview]
Message-ID: <2e4b9927-562d-4cfa-9362-f23e3bcfc454@linux.alibaba.com> (raw)
In-Reply-To: <47d189c7-3143-4b59-a3af-477d4c46a8a0@linux.alibaba.com>



On 2025/2/20 17:07, Baolin Wang wrote:
> 
> 
> On 2025/2/20 00:10, Zi Yan wrote:
>> On 19 Feb 2025, at 5:04, Baolin Wang wrote:
>>
>>> Hi Zi,
>>>
>>> Sorry for the late reply due to being busy with other things:)
>>
>> Thank you for taking a look at the patches. :)
>>
>>>
>>> On 2025/2/19 07:54, Zi Yan wrote:
>>>> During shmem_split_large_entry(), large swap entries are covering n 
>>>> slots
>>>> and an order-0 folio needs to be inserted.
>>>>
>>>> Instead of splitting all n slots, only the 1 slot covered by the folio
>>>> need to be split and the remaining n-1 shadow entries can be 
>>>> retained with
>>>> orders ranging from 0 to n-1.  This method only requires
>>>> (n/XA_CHUNK_SHIFT) new xa_nodes instead of (n % XA_CHUNK_SHIFT) *
>>>> (n/XA_CHUNK_SHIFT) new xa_nodes, compared to the original
>>>> xas_split_alloc() + xas_split() one.
>>>>
>>>> For example, to split an order-9 large swap entry (assuming 
>>>> XA_CHUNK_SHIFT
>>>> is 6), 1 xa_node is needed instead of 8.
>>>>
>>>> xas_try_split_min_order() is used to reduce the number of calls to
>>>> xas_try_split() during split.
>>>
>>> For shmem swapin, if we cannot swap in the whole large folio by 
>>> skipping the swap cache, we will split the large swap entry stored in 
>>> the shmem mapping into order-0 swap entries, rather than splitting it 
>>> into other orders of swap entries. This is because the next time we 
>>> swap in a shmem folio through shmem_swapin_cluster(), it will still 
>>> be an order 0 folio.
>>
>> Right. But the swapin is one folio at a time, right? 
>> shmem_split_large_entry()
> 
> Yes, now we always swapin an order-0 folio from the async swap device at 
> a time. However, for sync swap device, we will skip the swapcache and 
> swapin the whole large folio by commit 1dd44c0af4fa, so it will not call 
> shmem_split_large_entry() in this case.
> 
>> should split the large swap entry and give you a slot to store the 
>> order-0 folio.
>> For example, with an order-9 large swap entry, to swap in first 
>> order-0 folio,
>> the large swap entry will become order-0, order-0, order-1, order-2,… 
>> order-8,
>> after the split. Then the first order-0 swap entry can be used.
>> Then, when a second order-0 is swapped in, the second order-0 can be 
>> used.
>> When the last order-0 is swapped in, the order-8 would be split to
>> order-7,order-6,…,order-1,order-0, order-0, and the last order-0 will 
>> be used.
> 
> Yes, understood. However, for the sequential swapin scenarios, where 
> originally only one split operation is needed. However, your approach 
> increases the number of split operations. Of course, I understand that 
> in non-sequential swapin scenarios, your patch will save some xarray 
> memory. It might be necessary to evaluate whether the increased split 
> operations will have a significant impact on the performance of 
> sequential swapin?
> 
>> Maybe the swapin assumes after shmem_split_large_entry(), all swap 
>> entries
>> are order-0, which can lead to issues. There should be some check like
>> if the swap entry order > folio_order, shmem_split_large_entry() should
>> be used.
>>>
>>> Moreover I did a quick test with swapping in order 6 shmem folios, 
>>> however, my test hung, and the console was continuously filled with 
>>> the following information. It seems there are some issues with shmem 
>>> swapin handling. Anyway, I need more time to debug and test.
>> To swap in order-6 folios, shmem_split_large_entry() does not allocate
>> any new xa_node, since XA_CHUNK_SHIFT is 6. It is weird to see OOM
>> error below. Let me know if there is anything I can help.
> 
> I encountered some issues while testing order 4 and order 6 swapin with 
> your patches. And I roughly reviewed the patch, and it seems that the 
> new swap entry stored in the shmem mapping was not correctly updated 
> after the split.
> 
> The following logic is to reset the swap entry after split, and I assume 
> that the large swap entry is always split to order 0 before. As your 
> patch suggests, if a non-uniform split is used, then the logic for 
> resetting the swap entry needs to be changed? Please correct me if I 
> missed something.
> 
> /*
>   * Re-set the swap entry after splitting, and the swap
>   * offset of the original large entry must be continuous.
>   */
> for (i = 0; i < 1 << order; i++) {
>      pgoff_t aligned_index = round_down(index, 1 << order);
>      swp_entry_t tmp;
> 
>      tmp = swp_entry(swp_type(swap), swp_offset(swap) + i);
>      __xa_store(&mapping->i_pages, aligned_index + i,
>             swp_to_radix_entry(tmp), 0);
> }

In addition, after your patch, the shmem_split_large_entry() seems 
always return 0 even though it splits a large swap entry, but we still 
need re-calculate the swap entry value after splitting, otherwise it may 
return errors due to shmem_confirm_swap() validation failure.

/*
  * If the large swap entry has already been split, it is
  * necessary to recalculate the new swap entry based on
  * the old order alignment.
  */
  if (split_order > 0) {
	pgoff_t offset = index - round_down(index, 1 << split_order);

	swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
}

next prev parent reply	other threads:[~2025-02-20  9:27 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-18 23:54 [PATCH v2 0/2] Minimize xa_node allocation during xarry split Zi Yan
2025-02-18 23:54 ` [PATCH v2 1/2] mm/filemap: use xas_try_split() in __filemap_add_folio() Zi Yan
2025-02-18 23:54 ` [PATCH v2 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() Zi Yan
2025-02-19 10:04   ` Baolin Wang
2025-02-19 16:10     ` Zi Yan
2025-02-20  9:07       ` Baolin Wang
2025-02-20  9:27         ` Baolin Wang [this message]
2025-02-20 13:06           ` Zi Yan
2025-02-21  2:33             ` Zi Yan
2025-02-21  2:38               ` Zi Yan
2025-02-21  6:17                 ` Baolin Wang
2025-02-21 23:47                   ` Zi Yan
2025-02-25  9:25                     ` Baolin Wang
2025-02-25  9:20                 ` Baolin Wang
2025-02-25 10:15                   ` Baolin Wang
2025-02-25 16:41                     ` Zi Yan
2025-02-25 20:32                       ` Zi Yan
2025-02-26  6:37                         ` Baolin Wang
2025-02-26 15:03                           ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2e4b9927-562d-4cfa-9362-f23e3bcfc454@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kasong@tencent.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox