From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16A8AC5321D for ; Tue, 27 Aug 2024 03:06:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A17016B00A1; Mon, 26 Aug 2024 23:06:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C7626B00A2; Mon, 26 Aug 2024 23:06:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 88F0C6B00A3; Mon, 26 Aug 2024 23:06:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6C28D6B00A1 for ; Mon, 26 Aug 2024 23:06:43 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 28BC4161525 for ; Tue, 27 Aug 2024 03:06:43 +0000 (UTC) X-FDA: 82496537886.09.49E2D23 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) by imf29.hostedemail.com (Postfix) with ESMTP id F0849120004 for ; Tue, 27 Aug 2024 03:06:39 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=B+rBYa8Q; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf29.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724727982; a=rsa-sha256; cv=none; b=ceUsr4h38CH0j5495s8YMOKeSAgIgAsRtJlIxLomFcURjb4rZEPnN33FoP4av8uIN5qgTw iDGIEdXN9o86+BIbdnkDv+q37rhKc9XRtJGqVGU6OiNsbMrwT75ynXArPq4iL/P375h980 pMj95f9SInCbrxFrDBZGsp6/yxRbBAY= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=B+rBYa8Q; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf29.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724727982; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=StoVR4gs7cahfEfiJN6AgpKDWBI3QRTdteP01wTy10w=; b=Bp9L4sG67ECaucuybSTQHXUzyZxWWTh1uOSICaoX7jiI7RGJpAgWjtZnUJD+2VU4wwIOzd wlLGYDQ+wzQOzyUxOv8mZ+oj82R8l8mPLJ45qa+UTv1V9MhzAkSm+yUdU5DryBYwvFInaQ /MUdWFQrrDbUaNADz1+lQyf/2cqMDyw= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1724727997; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=StoVR4gs7cahfEfiJN6AgpKDWBI3QRTdteP01wTy10w=; b=B+rBYa8QP9UGyzCWYp3enthuBy8/Vg5AsnJR3Y7dJ4K2BTVUrOIY1CfBa0QDkYcwV7SFrDcwY3+ZrGuYtM/7e25xzoHK0AybYonwxCjxzMtLCBUf/eFnb+Nm08stszupKZqiZjaiyg3TMaNfianMzcm/hAQBLzsGNfJ5QSrw/Z0= Received: from 30.97.56.57(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WDl2qA4_1724727994) by smtp.aliyun-inc.com; Tue, 27 Aug 2024 11:06:35 +0800 Message-ID: <5b1e9c5a-7f61-4d97-a8d7-41767ca04c77@linux.alibaba.com> Date: Tue, 27 Aug 2024 11:06:34 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 6/9] mm: shmem: support large folio allocation for shmem_replace_folio() To: Hugh Dickins Cc: akpm@linux-foundation.org, willy@infradead.org, david@redhat.com, wangkefeng.wang@huawei.com, chrisl@kernel.org, ying.huang@intel.com, 21cnbao@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, ziy@nvidia.com, ioworker0@gmail.com, da.gomez@samsung.com, p.raghav@samsung.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: F0849120004 X-Rspamd-Server: rspam01 X-Stat-Signature: qq88kb869uirgoj7t631tb5hnuab9itw X-HE-Tag: 1724727999-945686 X-HE-Meta: U2FsdGVkX1/bklXN2o17YNMBhd07umbwjAg13kT3gl8ljF1la+IGDFm/ldVj+Qzw37A8bEV2+Cpac4NOQevZhNB+cetoh639QgLg6m39ub+2bbxP6uMhpeMomdlBB5lX97jwTFT+hoYrMlAT+fKpyYgITZOTd6uqcnJXsbSyiuORIVjgS2wQ0vIv6jyadfT7CmZYnBd8t+FxNw0L1vFm4Jv1opowzwFZ5pmKUXMtOiD5LHBABN7meSJELZ0l+GdvUe7c/4A0l2JBQeaFTNevgk/bsUqadBTcUQGsV9uCSjL+pikc7BtuX3xSzO4wMkmfdnY/VU/xvJ8KQF2vs8RkyAoz5ln40MlROOfx2HUOm9ep8yiHq4oDN8YdMsSBX8/VptnkRz3af9SX4Gpp8h1EyAen+rk8ifuf79ke9r5Fd6RLJVeti2bNrMCDEHr+c6yUflgVdFySgN1iTBOaoSSB0wneWplzyZg0tTKUwwyLWomdjkIB3LWHAJeEHwMLgVW1Ykqt0BCYP6/y30IANpDIgK1BgKtFX0IcgfoqHKWIkQro4jWxCj84upuvhQivvnFJlGUt2nlpUPNthbwCNw2FUezxpMDslvD8WC/qCH5cyGmNsvf9UbroFHIJuigqJfs/LjwZZ9cyIgWsOABlR/d/BbqqQSxdJPfY7j2MeZNAffDlAqFcjuaxOBQHzCsilmK6n5tCrtd+9XCDRod3UcvKHt1zuGuueWJRXV2RQ+nKnJM3U+MF/du/rkVQ9k+UkgzkcTqVgacKLYsu40CM8Ae1EKPAtEXHZkHDK3knKFX2YTsVSXoCnn6dVoZFPTZg66Urht9Q/v8dsKK8OL2cFNbq0QQaaHLbyb2lUdaIotMWHVwOzpAjYl/anUy8/zo45XYLIl9KodlhICOPYDaHhpvAHHzY2YZzQZoImv+XjuTAWX2a5aypkVh7fGkRepyDx5mdQ5n7cTR8RrXhIKfhLjV oI5nQgrv tjZZOPYMI80fg8pe6K/5BjJ1EnVgeiC29FOnZKVlAmByodlTHCkTKneuxSubynpeemsZ32WRVpmrXl2SBNt22Llm0hTWGD5JG+BrNKdUqLg1EGd7VSA5W6dFMJS3wK0Lx4RCJE+cdyoc2lhjDq00A8J7rd9tUrJAnfBuCfzE563MaYPDX2TAdVd0aRoVEwXbVeZ0RLMeAxxXr2LnvF9obzFnzcdAsj68DVSgvYoTW3xdpNeruVPX/IQc9sJYXkKtZ/SLUaDu5Hm8Ea7w7GVzq+sM0nMULy9b8natjpKep1uVQPEI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/8/26 06:05, Hugh Dickins wrote: > On Mon, 12 Aug 2024, Baolin Wang wrote: > >> To support large folio swapin for shmem in the following patches, add >> large folio allocation for the new replacement folio in shmem_replace_folio(). >> Moreover large folios occupy N consecutive entries in the swap cache >> instead of using multi-index entries like the page cache, therefore >> we should replace each consecutive entries in the swap cache instead >> of using the shmem_replace_entry(). >> >> As well as updating statistics and folio reference count using the number >> of pages in the folio. >> >> Signed-off-by: Baolin Wang >> --- >> mm/shmem.c | 54 +++++++++++++++++++++++++++++++----------------------- >> 1 file changed, 31 insertions(+), 23 deletions(-) >> >> diff --git a/mm/shmem.c b/mm/shmem.c >> index f6bab42180ea..d94f02ad7bd1 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -1889,28 +1889,24 @@ static bool shmem_should_replace_folio(struct folio *folio, gfp_t gfp) >> static int shmem_replace_folio(struct folio **foliop, gfp_t gfp, >> struct shmem_inode_info *info, pgoff_t index) >> { >> - struct folio *old, *new; >> - struct address_space *swap_mapping; >> - swp_entry_t entry; >> - pgoff_t swap_index; >> - int error; >> - >> - old = *foliop; >> - entry = old->swap; >> - swap_index = swap_cache_index(entry); >> - swap_mapping = swap_address_space(entry); >> + struct folio *new, *old = *foliop; >> + swp_entry_t entry = old->swap; >> + struct address_space *swap_mapping = swap_address_space(entry); >> + pgoff_t swap_index = swap_cache_index(entry); >> + XA_STATE(xas, &swap_mapping->i_pages, swap_index); >> + int nr_pages = folio_nr_pages(old); >> + int error = 0, i; >> >> /* >> * We have arrived here because our zones are constrained, so don't >> * limit chance of success by further cpuset and node constraints. >> */ >> gfp &= ~GFP_CONSTRAINT_MASK; >> - VM_BUG_ON_FOLIO(folio_test_large(old), old); >> - new = shmem_alloc_folio(gfp, 0, info, index); >> + new = shmem_alloc_folio(gfp, folio_order(old), info, index); > > It is not clear to me whether folio_order(old) will ever be more than 0 > here: but if it can be, then care will need to be taken over the gfp flags, With this patch set, it can be a large folio. If a large folio still exists in the swap cache, we will get a large folio during swap in. And yes, the gfp flags should be updated. How about the following fix? > that they are suited to allocating the large folio; and there will need to > be (could be awkward!) fallback to order 0 when that allocation fails. I do not think we should fallback to order 0 for a large folio, which will introduce more complex logic, for example, we should split the original large swap entries in shmem mapping, and it is tricky to free large swap entries, etc. So I want to keept it simple now. > My own testing never comes to shmem_replace_folio(): it was originally for > one lowend graphics driver; but IIRC there's now a more common case for it. Good to know. Thank you very much for your valuable input. [PATCH] mm: shmem: fix the gfp flag for large folio allocation In shmem_replace_folio(), it may be necessary to allocate a large folio, so we should update the gfp flags to ensure it is suitable for allocating the large folio. Signed-off-by: Baolin Wang --- mm/shmem.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index dd384d4ab035..d8038a66b110 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -155,7 +155,7 @@ static unsigned long shmem_default_max_inodes(void) static int shmem_swapin_folio(struct inode *inode, pgoff_t index, struct folio **foliop, enum sgp_type sgp, gfp_t gfp, - struct mm_struct *fault_mm, vm_fault_t *fault_type); + struct vm_area_struct *vma, vm_fault_t *fault_type); static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb) { @@ -1887,7 +1887,8 @@ static bool shmem_should_replace_folio(struct folio *folio, gfp_t gfp) } static int shmem_replace_folio(struct folio **foliop, gfp_t gfp, - struct shmem_inode_info *info, pgoff_t index) + struct shmem_inode_info *info, pgoff_t index, + struct vm_area_struct *vma) { struct folio *new, *old = *foliop; swp_entry_t entry = old->swap; @@ -1902,6 +1903,12 @@ static int shmem_replace_folio(struct folio **foliop, gfp_t gfp, * limit chance of success by further cpuset and node constraints. */ gfp &= ~GFP_CONSTRAINT_MASK; + if (nr_pages > 1) { + gfp_t huge_gfp = vma_thp_gfp_mask(vma); + + gfp = limit_gfp_mask(huge_gfp, gfp); + } + new = shmem_alloc_folio(gfp, folio_order(old), info, index); if (!new) return -ENOMEM; @@ -2073,10 +2080,11 @@ static int shmem_split_large_entry(struct inode *inode, pgoff_t index, */ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, struct folio **foliop, enum sgp_type sgp, - gfp_t gfp, struct mm_struct *fault_mm, + gfp_t gfp, struct vm_area_struct *vma, vm_fault_t *fault_type) { struct address_space *mapping = inode->i_mapping; + struct mm_struct *fault_mm = vma ? vma->vm_mm : NULL; struct shmem_inode_info *info = SHMEM_I(inode); struct swap_info_struct *si; struct folio *folio = NULL; @@ -2162,7 +2170,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, arch_swap_restore(folio_swap(swap, folio), folio); if (shmem_should_replace_folio(folio, gfp)) { - error = shmem_replace_folio(&folio, gfp, info, index); + error = shmem_replace_folio(&folio, gfp, info, index, vma); if (error) goto failed; } @@ -2243,7 +2251,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, if (xa_is_value(folio)) { error = shmem_swapin_folio(inode, index, &folio, - sgp, gfp, fault_mm, fault_type); + sgp, gfp, vma, fault_type); if (error == -EEXIST) goto repeat;