From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88C6DC021B1 for ; Thu, 20 Feb 2025 09:27:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1AA702802B9; Thu, 20 Feb 2025 04:27:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 15BAC2802B8; Thu, 20 Feb 2025 04:27:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3D852802B9; Thu, 20 Feb 2025 04:27:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D60F62802B8 for ; Thu, 20 Feb 2025 04:27:26 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A0F441C7BB5 for ; Thu, 20 Feb 2025 09:27:26 +0000 (UTC) X-FDA: 83139794892.13.F1C68CB Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf28.hostedemail.com (Postfix) with ESMTP id D66AFC0007 for ; Thu, 20 Feb 2025 09:27:23 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=jayoOSEL; spf=pass (imf28.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740043645; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=d8JIasV6FYxBloLP1Agsk0fsHmC9VSz0DW7F8PsJKiE=; b=D9iaWGNAw7yCaG1FIiW5c1JBC2f4a58p8PkA8s0tMfoHaRyEOSLnMGyJMkin7JH/+ZDISW NI8He+tvJfHT/1hDDKIb0mmyrRbotVHBzm3YoRF5elXDbOdK30ktXNIDJ9PazJ7VS2a/wL LA8sLvCN/3RAz+FvcPqxgEPCVM7e6bI= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=jayoOSEL; spf=pass (imf28.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740043645; a=rsa-sha256; cv=none; b=F9D8A33sZPSDr+u5ajNynkWvY4SfnWL/YxvTSrkNYminanhKzzKQpM5LGg64GOJf/f/ujJ rPfsqlZNgxILdWABVxt08+Ocz48MFjBJ1JPWgV+jpO/ULhnZ+vLnDxetawdJEUmKc4U4sj 4gdnEZ8pFsSllJEmsch/Id4AtgaIidI= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1740043641; h=Message-ID:Date:MIME-Version:Subject:From:To:Content-Type; bh=d8JIasV6FYxBloLP1Agsk0fsHmC9VSz0DW7F8PsJKiE=; b=jayoOSELnETKxY19HreNfB95c5LHJreqBQjfQErq8jBONwTGjvzLOPlnqwNEH9eUvvlXvU3uuCSIWe32Y95mRRGJfTNkJ0KEYjS9vMENEmuNTQIHkPd90Qn9e3xgenJFzErxGl9/5s6PGyMlZvKRy7+BGozVweb9W1T16E5mO/A= Received: from 30.74.144.124(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WPsDLQK_1740043639 cluster:ay36) by smtp.aliyun-inc.com; Thu, 20 Feb 2025 17:27:19 +0800 Message-ID: <2e4b9927-562d-4cfa-9362-f23e3bcfc454@linux.alibaba.com> Date: Thu, 20 Feb 2025 17:27:19 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 2/2] mm/shmem: use xas_try_split() in shmem_split_large_entry() From: Baolin Wang To: Zi Yan Cc: Matthew Wilcox , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Andrew Morton , Hugh Dickins , Kairui Song , Miaohe Lin , linux-kernel@vger.kernel.org References: <20250218235444.1543173-1-ziy@nvidia.com> <20250218235444.1543173-3-ziy@nvidia.com> <47d189c7-3143-4b59-a3af-477d4c46a8a0@linux.alibaba.com> In-Reply-To: <47d189c7-3143-4b59-a3af-477d4c46a8a0@linux.alibaba.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: D66AFC0007 X-Stat-Signature: maufgu6fabks5yrou7qtbiaazegwkcet X-HE-Tag: 1740043643-616263 X-HE-Meta: U2FsdGVkX18ms49+2rKD5dgTqJbcDbU62cLU00Od/v8cuTBLXvbWfe9R4viXGgRMB6skfxf9rxJOaGMIxDSoMVblxFLmlqsew0lizMvOqU/0gboXWGQhjO34UQhEz5fM155WDs78XZuGCVKkOdVB+ttD09Z2+JGuzv9zVbym4o3mgla9p1zoqAg+X9FyWUnmoD6bDpWAEeK9DPmUoJaDJboda+YpMwZFzzg3ePkO1qYOLWHgdc94sh+I2dNEASJveCBMCDVJMoja3zQkJ5MheVY1sTP2aCYf+P9r0hF4+ACtYMv6T0huf8XIlZs63Vh87AMD16UrT6iZG9/BxumwWTGOgJTfvZTnewbSYci1nv0hfX31gOnCBAPjZ8QVLPm+fSq7CGyUJrsOmkcUdBVAbB8jkYUez0TyH/uyv3bDs/LbHJ2FluFrDEN6JRpxcBhWKfAHiUkKaZ7YT4BnBfIgM7UnSw3kkqclqQyFsocVkZyAAYJoss4+BXqDy2L99/x2X6Y2CfXSN71dysliil5TW7n3Ld9wytzdhLIB886DCDKgmMKD5zSySe7H8mmaGkPdctG1zhRaIeRFSsTZxsvn1rHbNT9VdDIXryN81FiQ9GOuMNlb2UJQFp9lQqOpUBNwjkoKUslnU+wRuieRXeLSXt1EXoInnFEAzz5NB3lAlDeRBves3Tddya19bQZyHLSWI5L21xJd8locZN3UBvm75iu2brPPuK0ghAHsX3J91nuVZTaU9Rh0kCLSwooMYZfp8yuO953piLPyFSRvFQRdmCQYbd+3Lmxm+y8aFSaEWmm4FRmGifI64OgacXXYYMc0Tw3IVBzY7FZjE0C8jwMEPNw3Z6Gfbn9JN5sTnL7D+IkuT+gAgWL++lUuKWx3bJzguORVJoQU9HXHmLdM2xmk7tDNLlO7i4df0biXTRDlDwih7xdc83Tv7Hnns2G+S7JQA/aO0/iD7TEoU6gbgeS jBG2e6l5 cCU6sOXlqewjh3UTXKH0epnWrCu5drtg74XJ8onRUKxCZYzG9n1REFIjaoHOw/yp32w9DkubXzbJZiiJ0PMrY2aCtW69Wnlfu7xkc5koRd8S52nxi8lFkjA/WTQmJG5qrhlOj7EFE0dNoN96eZOM/zwc6IxpXeZfkMi67E87U4jbU74YXywHuyq7G3wzuwR9xEht1k23OPgTXAL6DiOM54ppcucCq9F53MWOYyk3gepQnlbrrVVLcWAklkDm7pgeaPxOLi6jcXsP2K8RDfTkIE7jfXho5BuoLT7BZKvJ/cl6UVxtgdzDoH8PRrbcXKg7O6aB8GWwI5cBluUUOiIIHp8ClAVG+Krdi1ARKF/7EWY7tVglUghb3U6zdpWhaKhD1wd4XQ/d3Mh2hUvHYMuY9+Gh0sQn6VzwQ9okatC3g7z8bWwtDnYy8vM+TnG1NRDGoAuCiiyYkrXKyunnxbSubYXdoWH3GiyBjEyLay1lEG/d0dj4LvxLDLaduuV5KG2Or1b8nq3aKz9S4JKgM4+im43kN9lZM8KFWHAI3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/2/20 17:07, Baolin Wang wrote: > > > On 2025/2/20 00:10, Zi Yan wrote: >> On 19 Feb 2025, at 5:04, Baolin Wang wrote: >> >>> Hi Zi, >>> >>> Sorry for the late reply due to being busy with other things:) >> >> Thank you for taking a look at the patches. :) >> >>> >>> On 2025/2/19 07:54, Zi Yan wrote: >>>> During shmem_split_large_entry(), large swap entries are covering n >>>> slots >>>> and an order-0 folio needs to be inserted. >>>> >>>> Instead of splitting all n slots, only the 1 slot covered by the folio >>>> need to be split and the remaining n-1 shadow entries can be >>>> retained with >>>> orders ranging from 0 to n-1.  This method only requires >>>> (n/XA_CHUNK_SHIFT) new xa_nodes instead of (n % XA_CHUNK_SHIFT) * >>>> (n/XA_CHUNK_SHIFT) new xa_nodes, compared to the original >>>> xas_split_alloc() + xas_split() one. >>>> >>>> For example, to split an order-9 large swap entry (assuming >>>> XA_CHUNK_SHIFT >>>> is 6), 1 xa_node is needed instead of 8. >>>> >>>> xas_try_split_min_order() is used to reduce the number of calls to >>>> xas_try_split() during split. >>> >>> For shmem swapin, if we cannot swap in the whole large folio by >>> skipping the swap cache, we will split the large swap entry stored in >>> the shmem mapping into order-0 swap entries, rather than splitting it >>> into other orders of swap entries. This is because the next time we >>> swap in a shmem folio through shmem_swapin_cluster(), it will still >>> be an order 0 folio. >> >> Right. But the swapin is one folio at a time, right? >> shmem_split_large_entry() > > Yes, now we always swapin an order-0 folio from the async swap device at > a time. However, for sync swap device, we will skip the swapcache and > swapin the whole large folio by commit 1dd44c0af4fa, so it will not call > shmem_split_large_entry() in this case. > >> should split the large swap entry and give you a slot to store the >> order-0 folio. >> For example, with an order-9 large swap entry, to swap in first >> order-0 folio, >> the large swap entry will become order-0, order-0, order-1, order-2,… >> order-8, >> after the split. Then the first order-0 swap entry can be used. >> Then, when a second order-0 is swapped in, the second order-0 can be >> used. >> When the last order-0 is swapped in, the order-8 would be split to >> order-7,order-6,…,order-1,order-0, order-0, and the last order-0 will >> be used. > > Yes, understood. However, for the sequential swapin scenarios, where > originally only one split operation is needed. However, your approach > increases the number of split operations. Of course, I understand that > in non-sequential swapin scenarios, your patch will save some xarray > memory. It might be necessary to evaluate whether the increased split > operations will have a significant impact on the performance of > sequential swapin? > >> Maybe the swapin assumes after shmem_split_large_entry(), all swap >> entries >> are order-0, which can lead to issues. There should be some check like >> if the swap entry order > folio_order, shmem_split_large_entry() should >> be used. >>> >>> Moreover I did a quick test with swapping in order 6 shmem folios, >>> however, my test hung, and the console was continuously filled with >>> the following information. It seems there are some issues with shmem >>> swapin handling. Anyway, I need more time to debug and test. >> To swap in order-6 folios, shmem_split_large_entry() does not allocate >> any new xa_node, since XA_CHUNK_SHIFT is 6. It is weird to see OOM >> error below. Let me know if there is anything I can help. > > I encountered some issues while testing order 4 and order 6 swapin with > your patches. And I roughly reviewed the patch, and it seems that the > new swap entry stored in the shmem mapping was not correctly updated > after the split. > > The following logic is to reset the swap entry after split, and I assume > that the large swap entry is always split to order 0 before. As your > patch suggests, if a non-uniform split is used, then the logic for > resetting the swap entry needs to be changed? Please correct me if I > missed something. > > /* >  * Re-set the swap entry after splitting, and the swap >  * offset of the original large entry must be continuous. >  */ > for (i = 0; i < 1 << order; i++) { >     pgoff_t aligned_index = round_down(index, 1 << order); >     swp_entry_t tmp; > >     tmp = swp_entry(swp_type(swap), swp_offset(swap) + i); >     __xa_store(&mapping->i_pages, aligned_index + i, >            swp_to_radix_entry(tmp), 0); > } In addition, after your patch, the shmem_split_large_entry() seems always return 0 even though it splits a large swap entry, but we still need re-calculate the swap entry value after splitting, otherwise it may return errors due to shmem_confirm_swap() validation failure. /* * If the large swap entry has already been split, it is * necessary to recalculate the new swap entry based on * the old order alignment. */ if (split_order > 0) { pgoff_t offset = index - round_down(index, 1 << split_order); swap = swp_entry(swp_type(swap), swp_offset(swap) + offset); }