From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0781C54FB3 for ; Fri, 30 May 2025 01:58:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5F0B86B0082; Thu, 29 May 2025 21:58:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C8996B0083; Thu, 29 May 2025 21:58:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 507956B0085; Thu, 29 May 2025 21:58:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 32D046B0082 for ; Thu, 29 May 2025 21:58:23 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A93EBC1CB1 for ; Fri, 30 May 2025 01:58:22 +0000 (UTC) X-FDA: 83497914444.03.55F8862 Received: from out30-118.freemail.mail.aliyun.com (out30-118.freemail.mail.aliyun.com [115.124.30.118]) by imf29.hostedemail.com (Postfix) with ESMTP id 5086512000C for ; Fri, 30 May 2025 01:58:19 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=nSxzLnlG; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf29.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.118 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748570301; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yy5/fgwWGRWQFc3zPba+l8G/Wl8NsYpv5LjubYSD8qY=; b=yBrFQEWDnuLYy8ewnRVTssE4gGMjJvJXC12+I7N6g4e6V9yJT3MgO412+kQ6glQvYO6ON+ GQox3ey4IJOgJAPppbF6HGAYzVb+lRgreu0jUKnpuS/Q/LAFxOJTzJmxh6LARBX42N8n87 UuApiAme/lNese6r+NECU9DUgjroDNA= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=nSxzLnlG; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf29.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.118 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748570301; a=rsa-sha256; cv=none; b=qVJ/RDXrYBjOdSlbfN+XiqZvaR8h05QyZdCnoJpSJ4HRtKj1uksaU8iWUKzn66qOdTq9wq CXwVrSLnztjg8AxAT+g+kBgdc+G3Xp1SaTzO+NBe3dV7LRiEFO74zWWQr0z2WrernvBhQJ yjZBS554yK6cR6G4dc2WrDwMBh9qdLg= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1748570297; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=yy5/fgwWGRWQFc3zPba+l8G/Wl8NsYpv5LjubYSD8qY=; b=nSxzLnlG/LfEPu3Lsz05BR6LmKAVwg1riLfOGCLnuH4malGlNCKay1q897cRiCmGGQCcI8SDtBsdRKTlEhqolL1G+xnzrkUO4R54uwJIp1LcepmHQDw9zfKNBua3Uo6xkeL0eX3+nD7uWoB+XE7tpfiexUOKaDNuKyTRZ1OGMSU= Received: from 30.74.144.115(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WcJNvj6_1748570294 cluster:ay36) by smtp.aliyun-inc.com; Fri, 30 May 2025 09:58:15 +0800 Message-ID: Date: Fri, 30 May 2025 09:58:13 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled To: Zi Yan Cc: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org References: From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 5086512000C X-Stat-Signature: 7wdsbqrozmisfdhaqte1nettuicgfkap X-Rspam-User: X-HE-Tag: 1748570299-817516 X-HE-Meta: U2FsdGVkX184IG+hOXrvcDKM1oxpORGGVeHM7OTk57udXSpR8YfAO01YhOvRlkDekJqjXjU64t42H5I4m7fwfanAMuc58IUw0las6rhg0SuKUe7HkMi+6fdGwdbAGTKaZWkXUgIZ+Jj51x0DTcUsx7SxQL3h5vgPBcFrEHFJZb8s4muuNoNl1I0Ed3LbxwC02muFBuYw3vLJr5wuiNFo4poIW0WLDyYGD54f14eQz8wEHtCyNuCgH/OB6XcvHv31RwK3cCv3Uah2LGJoCwOWo3RPWijkQWURUMyIGlNXulpDritWqcsrdmozu7l5qCGGQCczOF18dfgowJqmPSRmoaxpzlvE05A5CvT/IyppFWW6t5e6NFMdZcVKWtQlSe5+XsKILnU75N2qANkCrzqKYtZssM1hcBc1QPs6mMeDZ5/C2fjOM4SVGduCS2vj8dhBIZVDKbS9Mh92GNhk9C2ukJjR4KOPLDZOLg1ga6fxIgJrnYtWrH2mUD9PBC0drEg0cU4BYwWVx8Jq2FXT6JkmkAYI03s2USKLZULcPwnn02viVpl7BFLjPkQYnQy0LGJ4DjxVC0qLcnHU+17C84iTIQdu8d7tRNT8rJy3dEl6qq3sRXtiw9FhW6o9rRjwZH1ZEB6Csp0edrzJ5HxKjsLZzJQtRNnPzhmNU3M7GQ9khYjscyxgXYLX+rr/Gvbce4l3YFm/9z0SiciLDWn2fkfHgrUaBV9JoBZgho4m903JBgM2dyZNa4fTYQ5ibXy5rqCD9vWOloKt1PDdpC+KZa3HyN5kTDR4rS2J8ahYWlJVdXWI9TP3q8HJzYqcV8vdcWjXbRbEQwVvWL5nNrjjKoeY+pfprlSegHdblA57cGAzeV4lpCT8YY2S437HMaYPDTEo9WpKXNuqekgFbqayEZH5ENHdXzwE4KIwCV7p2Og1oVvGIgrw9JJSPW4yfpGgpIgrjRh4TjMofhmZVXKhBh/ vO8OrSvS EXUeSQ95B3m81TOjRu1Gw0w/WNen9sp1LvyYe7ac+NiGjuroJcOodIY3uO1JeVMMBhf5jIijQXFNwGli4IBNqcnv4FPa9gwvifRDEDnoGLXnkisGIXCq0iR+axtPRX/zcRZ2IUBa2YFadjm9IsvRC3Tn3b0lWp5GdwKPrvzW94ymF4uPOrO6Mi/QXwtp6vipIJaJMHY+ebN5Gu4hviCRhFze/J5o6eVvY2qHu51SiD7cvuw4/emHZlUl0BHJcF3W2K3lSpnBMEy18OtD415TWi6ES2G6L6+eLeaUueDRX6CDjI80Mlw67YLvIvg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/5/29 23:21, Zi Yan wrote: > On 29 May 2025, at 4:23, Baolin Wang wrote: > >> The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which >> means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE >> will still attempt to collapse into a shmem THP. This violates the rule we have >> agreed upon: never means never. >> >> Then the current strategy is: >> For shmem, if none of always, madvise, within_size, and inherit have enabled >> PMD-sized mTHP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized mTHP. >> >> For tmpfs, if the mount option is set with the 'huge=never' parameter, then >> MADV_COLLAPSE will be prohibited from collapsing PMD-sized mTHP. >> >> Signed-off-by: Baolin Wang >> --- >> mm/huge_memory.c | 2 +- >> mm/shmem.c | 12 ++++++------ >> 2 files changed, 7 insertions(+), 7 deletions(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index d3e66136e41a..a8cfa37cae72 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -166,7 +166,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, >> * own flags. >> */ >> if (!in_pf && shmem_file(vma->vm_file)) >> - return shmem_allowable_huge_orders(file_inode(vma->vm_file), >> + return orders & shmem_allowable_huge_orders(file_inode(vma->vm_file), >> vma, vma->vm_pgoff, 0, >> !enforce_sysfs); > > OK, here orders is checked against allowed orders. > >> >> diff --git a/mm/shmem.c b/mm/shmem.c >> index 4b42419ce6b2..4dbb28d85cd9 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -613,7 +613,7 @@ static unsigned int shmem_get_orders_within_size(struct inode *inode, >> } >> >> static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index, >> - loff_t write_end, bool shmem_huge_force, >> + loff_t write_end, >> struct vm_area_struct *vma, >> unsigned long vm_flags) >> { >> @@ -625,7 +625,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index >> return 0; >> if (shmem_huge == SHMEM_HUGE_DENY) >> return 0; >> - if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE) >> + if (shmem_huge == SHMEM_HUGE_FORCE) >> return maybe_pmd_order; > > shmem_huge is set by sysfs? Yes, through the '/sys/kernel/mm/transparent_hugepage/shmem_enabled' interface. >> /* >> @@ -860,7 +860,7 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo, >> } >> >> static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index, >> - loff_t write_end, bool shmem_huge_force, >> + loff_t write_end, >> struct vm_area_struct *vma, >> unsigned long vm_flags) >> { >> @@ -1261,7 +1261,7 @@ static int shmem_getattr(struct mnt_idmap *idmap, >> STATX_ATTR_NODUMP); >> generic_fillattr(idmap, request_mask, inode, stat); >> >> - if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) >> + if (shmem_huge_global_enabled(inode, 0, 0, NULL, 0)) >> stat->blksize = HPAGE_PMD_SIZE; >> >> if (request_mask & STATX_BTIME) { >> @@ -1768,7 +1768,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode, >> return 0; >> >> global_orders = shmem_huge_global_enabled(inode, index, write_end, >> - shmem_huge_force, vma, vm_flags); >> + vma, vm_flags); >> /* Tmpfs huge pages allocation */ >> if (!vma || !vma_is_anon_shmem(vma)) >> return global_orders; >> @@ -1790,7 +1790,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode, >> /* Allow mTHP that will be fully within i_size. */ >> mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0); >> >> - if (vm_flags & VM_HUGEPAGE) >> + if (shmem_huge_force || (vm_flags & VM_HUGEPAGE)) >> mask |= READ_ONCE(huge_shmem_orders_madvise); >> >> if (global_orders > 0) >> -- >> 2.43.5 > > shmem_huge_force comes from !enforce_sysfs in __thp_vma_allowable_orders(). > Do you know when sysfs is not enforced and why? IIUC, shmem_huge_force will only be set during MADV_COLLAPSE. Originally, MADV_COLLAPSE was intended to ignore the system-wide THP sysfs settings. However, if all system-wide shmem THP settings are disabled, we should not allow MADV_COLLAPSE to collapse a THP. This is the issue this patchset aims to fix. Thanks for the review.