From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CA606C021B1
	for <linux-mm@archiver.kernel.org>; Thu, 20 Feb 2025 09:07:43 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 60D622802B7; Thu, 20 Feb 2025 04:07:43 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 5BE6F2802B6; Thu, 20 Feb 2025 04:07:43 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 484A12802B7; Thu, 20 Feb 2025 04:07:43 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id 2ADEC2802B6
	for <linux-mm@kvack.org>; Thu, 20 Feb 2025 04:07:43 -0500 (EST)
Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id CBDFD1A09F6
	for <linux-mm@kvack.org>; Thu, 20 Feb 2025 09:07:42 +0000 (UTC)
X-FDA: 83139745164.23.BBFB593
Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132])
	by imf30.hostedemail.com (Postfix) with ESMTP id 986F38000D
	for <linux-mm@kvack.org>; Thu, 20 Feb 2025 09:07:39 +0000 (UTC)
Authentication-Results: imf30.hostedemail.com;
	dkim=pass header.d=linux.alibaba.com header.s=default header.b=XCkyIo8h;
	spf=pass (imf30.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com;
	dmarc=pass (policy=none) header.from=linux.alibaba.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1740042461;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=gpT1/jCfOF+xlpyUls6HFqnFvhxzqevHqyGAov1yiaI=;
	b=Yv2uJYbQthr42eQfxUD21wEcvz2f5Gl06zb08hmUmIG1kFZ7FAOSaqIK2AIcTkWnAOTd6r
	zA5Z0GpqW7HSmAejyPtl1ldMkjDW2O+3nk+r4B9CnhBubwpLeOx60bn3acqMvGHiA98Gsv
	50f/md7XreIp9HCXggDrL+ifvsA/uJA=
ARC-Authentication-Results: i=1;
	imf30.hostedemail.com;
	dkim=pass header.d=linux.alibaba.com header.s=default header.b=XCkyIo8h;
	spf=pass (imf30.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com;
	dmarc=pass (policy=none) header.from=linux.alibaba.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740042461; a=rsa-sha256;
	cv=none;
	b=yuAxWyu0fiCmwd5f3u9dXpAp32FTMAOYt38K/xKSLiOlgd+6oR9wJBwV3ZD4fdyFc7Br4T
	aUU0rDjWhwlpbV6clpGYNxul0OoxdNkyIjr1MMB5m4tgzGucRtHKHeRt9nZ93qtxAOTOBK
	8n88LdtixD00FVcugg2LNH/eRj6OpdI=
DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=linux.alibaba.com; s=default;
	t=1740042456; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type;
	bh=gpT1/jCfOF+xlpyUls6HFqnFvhxzqevHqyGAov1yiaI=;
	b=XCkyIo8hvoXJdfBo7FP6yNP3OSxDGL1hrzuXzBfbcwkCN+xOtTHNFi69US65/QGfP9Uk46f9dhG1v8nFmdCg5qmTnCl2dZVRMWiCM65SxphTuj9Rg+LG7YL/4EUB5Phiv8aFZHriN6mSLfR9HI+LGTOx6L+8FNrtCydkE8GFc4E=
Received: from 30.74.144.124(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WPsDYRp_1740042454 cluster:ay36)
          by smtp.aliyun-inc.com;
          Thu, 20 Feb 2025 17:07:35 +0800
Message-ID: <47d189c7-3143-4b59-a3af-477d4c46a8a0@linux.alibaba.com>
Date: Thu, 20 Feb 2025 17:07:34 +0800
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH v2 2/2] mm/shmem: use xas_try_split() in
 shmem_split_large_entry()
To: Zi Yan <ziy@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>, linux-mm@kvack.org,
 linux-fsdevel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>,
 Hugh Dickins <hughd@google.com>, Kairui Song <kasong@tencent.com>,
 Miaohe Lin <linmiaohe@huawei.com>, linux-kernel@vger.kernel.org
References: <20250218235444.1543173-1-ziy@nvidia.com>
 <20250218235444.1543173-3-ziy@nvidia.com>
 <f899d6b3-e607-480b-9acc-d64dfbc755b5@linux.alibaba.com>
 <AD348832-5A6A-48F1-9735-924F144330F7@nvidia.com>
From: Baolin Wang <baolin.wang@linux.alibaba.com>
In-Reply-To: <AD348832-5A6A-48F1-9735-924F144330F7@nvidia.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Rspam-User: 
X-Rspamd-Server: rspam09
X-Rspamd-Queue-Id: 986F38000D
X-Stat-Signature: hku54g1xz875b3thjeeoft7kp6keu1qn
X-HE-Tag: 1740042459-331531
X-HE-Meta: U2FsdGVkX19e6NQ1MXPJf83n8FW2EXcWy62vMYCOyZmMb/OaWL+EVQpYWmyOluIJmNhQAyhPQaESRVnxmuTGVF9Iv6ww0QyV/r2V8O+UPiN/HC2ejBlyn1pblvwtYxi0BNrBuvC2TNg3f5aK0vVwzP9NqHX50oxlA9MhtTPwVtsspNyoqSd7CdEywy6D5Upo7TOAp3Dpznu1ZoHZt13mKtbbL/b6kvIYqV6T2OUpKjVqMmSrPEA1oDZH+hwikp2o222WsqyDZcq19Ztk9xHIZKifj+T2A2Tjbx/pwvYJwSPIlK/qlTHXnKiVWSkk3YM6XQEDPVrFwlZM3jmdPF5PxtZR/pTK+a6JeXtZa/UUq9Mm7ii098lbJYVbE08fKRmmnCRqEt5mkMrlESlmzTMc5m3AteAiltvR+2bIYdEqVHKDXpBx3DGxH7HNomsLXuCKfuRL5b8wX5wzfIieL3Qb58gnCi8sVYNIcbiQ9XNfkq77igUmYpmXe+FJT0CEIPxYx+ZxNXyiOlePgzqkkWJpuU+QnclkDMXFS5NEjYDbfFFKf1DtbMDIrj+Bu6+7+B+Y82JzcweEYnb5NKIUzvFTBcmGCPDH5Zy0asCXDgLBZV6NSBA0gtI4SBQYS+813wxJL0XJVJsRKQK/o/Zh7k4FhdnkoD0qXfmEcrEHcGb0akdZFESHHCJtW7WXI0dQhx80y9lb47DKTDV5BOX94HbWl1kyMNmwE+UkFNwCwSwSHQW8zfA7hJskm/DiEz0Kk6TiN05CEeYlNM7u7dNMvJDWe2mw+HZ/B328fPsCXXVpIt8xAl1+H82Ozf6cnljKXhqegPui0gYJiLZUqyZ8f/q3qkn0AbGuOcxg3Zm2oDcE5+t+K7ZFJqeA+IiR1WlNyMc//Ak5jKp4RnG7pm1oDm2gdqrR9RLPAEz5sk0nZMRRV3/EAL3ANAlY07DXm75SxJ7PqpqZ3RvXOOLrGgqO02C
 C7wVe1RF
 PNRqds9bBfJWYuAN0ZvztQqRkgfgVVUP9JcN5cX2eJFHDuAryI1HUuhNTDFskQfA06dSMah4UqaiMxsStPUBX/Rt5HAGT6pQ9WgLWjbB7tgWRypTOIslCHr4VaUQKgH9/wWUp74lMvClZwOAX9fSijcmF84hXql/jctqW/aHD1orbne3pWLQBAlkFu/DmlS0xiK67k41/7zf2L055VGYqmg4Lyj0uEGI5oQrk3PRMVS1nWy2LfSMjHshRWT8/cPOJIm5rtL6z1QtCdFU2nmj7StRhewkaboXZWd2VGmMv4oD7H+OGJLm1XHpVXnmUx35xl/h7
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>


On 2025/2/20 00:10, Zi Yan wrote:
> On 19 Feb 2025, at 5:04, Baolin Wang wrote:
> 
>> Hi Zi,
>>
>> Sorry for the late reply due to being busy with other things:)
> 
> Thank you for taking a look at the patches. :)
> 
>>
>> On 2025/2/19 07:54, Zi Yan wrote:
>>> During shmem_split_large_entry(), large swap entries are covering n slots
>>> and an order-0 folio needs to be inserted.
>>>
>>> Instead of splitting all n slots, only the 1 slot covered by the folio
>>> need to be split and the remaining n-1 shadow entries can be retained with
>>> orders ranging from 0 to n-1.  This method only requires
>>> (n/XA_CHUNK_SHIFT) new xa_nodes instead of (n % XA_CHUNK_SHIFT) *
>>> (n/XA_CHUNK_SHIFT) new xa_nodes, compared to the original
>>> xas_split_alloc() + xas_split() one.
>>>
>>> For example, to split an order-9 large swap entry (assuming XA_CHUNK_SHIFT
>>> is 6), 1 xa_node is needed instead of 8.
>>>
>>> xas_try_split_min_order() is used to reduce the number of calls to
>>> xas_try_split() during split.
>>
>> For shmem swapin, if we cannot swap in the whole large folio by skipping the swap cache, we will split the large swap entry stored in the shmem mapping into order-0 swap entries, rather than splitting it into other orders of swap entries. This is because the next time we swap in a shmem folio through shmem_swapin_cluster(), it will still be an order 0 folio.
> 
> Right. But the swapin is one folio at a time, right? shmem_split_large_entry()

Yes, now we always swapin an order-0 folio from the async swap device at 
a time. However, for sync swap device, we will skip the swapcache and 
swapin the whole large folio by commit 1dd44c0af4fa, so it will not call 
shmem_split_large_entry() in this case.

> should split the large swap entry and give you a slot to store the order-0 folio.
> For example, with an order-9 large swap entry, to swap in first order-0 folio,
> the large swap entry will become order-0, order-0, order-1, order-2,… order-8,
> after the split. Then the first order-0 swap entry can be used.
> Then, when a second order-0 is swapped in, the second order-0 can be used.
> When the last order-0 is swapped in, the order-8 would be split to
> order-7,order-6,…,order-1,order-0, order-0, and the last order-0 will be used.

Yes, understood. However, for the sequential swapin scenarios, where 
originally only one split operation is needed. However, your approach 
increases the number of split operations. Of course, I understand that 
in non-sequential swapin scenarios, your patch will save some xarray 
memory. It might be necessary to evaluate whether the increased split 
operations will have a significant impact on the performance of 
sequential swapin?

> Maybe the swapin assumes after shmem_split_large_entry(), all swap entries
> are order-0, which can lead to issues. There should be some check like
> if the swap entry order > folio_order, shmem_split_large_entry() should
> be used.
>>
>> Moreover I did a quick test with swapping in order 6 shmem folios, however, my test hung, and the console was continuously filled with the following information. It seems there are some issues with shmem swapin handling. Anyway, I need more time to debug and test.
> To swap in order-6 folios, shmem_split_large_entry() does not allocate
> any new xa_node, since XA_CHUNK_SHIFT is 6. It is weird to see OOM
> error below. Let me know if there is anything I can help.

I encountered some issues while testing order 4 and order 6 swapin with 
your patches. And I roughly reviewed the patch, and it seems that the 
new swap entry stored in the shmem mapping was not correctly updated 
after the split.

The following logic is to reset the swap entry after split, and I assume 
that the large swap entry is always split to order 0 before. As your 
patch suggests, if a non-uniform split is used, then the logic for 
resetting the swap entry needs to be changed? Please correct me if I 
missed something.

/*
  * Re-set the swap entry after splitting, and the swap
  * offset of the original large entry must be continuous.
  */
for (i = 0; i < 1 << order; i++) {
	pgoff_t aligned_index = round_down(index, 1 << order);
	swp_entry_t tmp;

	tmp = swp_entry(swp_type(swap), swp_offset(swap) + i);
	__xa_store(&mapping->i_pages, aligned_index + i,
		   swp_to_radix_entry(tmp), 0);
}