From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EB75C71136 for ; Thu, 12 Jun 2025 07:51:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E64496B007B; Thu, 12 Jun 2025 03:51:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E3B376B0088; Thu, 12 Jun 2025 03:51:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D77A66B0089; Thu, 12 Jun 2025 03:51:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B96686B007B for ; Thu, 12 Jun 2025 03:51:56 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 68B1D1D70DD for ; Thu, 12 Jun 2025 07:51:56 +0000 (UTC) X-FDA: 83545979832.26.BC76D1A Received: from out30-97.freemail.mail.aliyun.com (out30-97.freemail.mail.aliyun.com [115.124.30.97]) by imf17.hostedemail.com (Postfix) with ESMTP id E8E094000C for ; Thu, 12 Jun 2025 07:51:52 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=qVULdPAj; spf=pass (imf17.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.97 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749714714; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aynmETi5QcvCc7hYrpbVi3vxwUCYZE5k+WSqXSsZ868=; b=I65NQd9Pm+G7MWeBjkp0HIBOxaIrrk20u4TVaK52kipB86mP8YH3Q1G/+fzBE7DsypIRIq jtL03cJnPd9bJ10PVfNgrLcWNtw9eL9rjcZASeZj4gUks7k/3ZWUuFtA/dB6/OYQLiXXLX wbZ4phj7rTuCtQTic20OgoyVCbifkI0= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=qVULdPAj; spf=pass (imf17.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.97 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749714714; a=rsa-sha256; cv=none; b=tLSnhfjsB783/HnSrsUGKTwKTvqDQAeNzm7iwkDi6Y7lbzcO8wo+cBAvJvDOwl/zmBbJN3 CTyHBQlWPv7rtaPMNa0xZJpYupbqaU5bxRprshSmn53d42VoEUO91J2c1bg4SCVrZFJdp9 7mN9HnQ0Q6ZCKSrdpoSV8vD5CmIo++I= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1749714710; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=aynmETi5QcvCc7hYrpbVi3vxwUCYZE5k+WSqXSsZ868=; b=qVULdPAjrwG5Lcu9EaXAMvBMNmzZ1ex8Jz3C9d8DBqyvifz2NSvMLqp7nPc+eWEl9nkBr1kwji9uJZredT8wOnQpTxhSdXLAsxnSVhVaNs726BeWV8phQnL7LgZM+fNSMRDKxrCOUTbtIRv+PB7NtCw5gqZtRoPmHXg9Wb7j2Jc= Received: from 30.74.144.123(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0Wdg73Zo_1749714707 cluster:ay36) by smtp.aliyun-inc.com; Thu, 12 Jun 2025 15:51:48 +0800 Message-ID: <2ff65f37-efa9-4e96-9cdf-534d63ff154e@linux.alibaba.com> Date: Thu, 12 Jun 2025 15:51:47 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled To: David Hildenbrand , akpm@linux-foundation.org, hughd@google.com Cc: lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <8eefb0809c598fadaa4a022634fba5689a4f3257.1749109709.git.baolin.wang@linux.alibaba.com> <1ec368c4-c4d8-41ea-b8a3-7d1fdb3ec358@redhat.com> From: Baolin Wang In-Reply-To: <1ec368c4-c4d8-41ea-b8a3-7d1fdb3ec358@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: bu991korjyrokmj3r89baik519hrckkh X-Rspamd-Queue-Id: E8E094000C X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1749714712-386371 X-HE-Meta: U2FsdGVkX19DofW6YuqeLaUroWuX+4I/EKz+Eu3ed7Nu8scCWh+CUtLiytL4lcmshIxQuc35cN+49+B6PLhtucdsQbJqCth/hbqrKxMew4A0fSZr9vKl5jAYxzmCSOZs0N4n9QYD2+Cg+SNpx7nWrHZTZYoFBHUBfvOjoMbFFefvDBhLmZ8hQyQw0F0hF1a0ok0sftDMdVTpZi6YSgjqPjlsuXKNrdXF54f+zDajCIYzijTE9F9qu+tR086aCvTOGTZDgeqWmJOsFt6+IRFBHPxo/NSqJEZ1oNxeJ3r/dc1FI7IXNJuBW+mUr9G4qGnMtxIkOUm19wzS4LsL3esHPI8u2tpzjGFr1XDxYdmfLdojrqB3IemSUvvr2vkcPTN6HputOUiVdYChIb10oBgvzogTF5zpEa1lfSHlDP7VXqLCT1ZoPziS0eBQLq5WkdIchVy7e4DkhOhgXcn0drllWHW6JSHNn5a4JwYcHGWuXwWotFGJZe+5MckTu6FO4QyIaT2f/j9Kcti75ZRQ7VKuCX88x4Oqh8dXWNj1ktCHsvXmpL5Ea2xa7dvadILKzsvy1E60KEMRtTFofZUV7L0wJjdAueIRXMWy3ZZ50udnanCnAjnNY0AyiFIVO5GRmKrp6CjLfZ+RwgFQXSyCESYmXERQb7blpaOs2R5EQbiNBjQUEEK0uNEKxS1XxWfS/RUnRcCKJ1bV6A6nuYesL3EuRprdYrkGFQEbQrVFXa9MTIkEqvIHmnNN6T/g8f3ofUkTta6eyj/YKLjpaLi02U3oMCshOxIKdnnBhQu46Zus0C3WLW+kboF5l9+D4JOA5s0lXSCvwk2XjoBmgeOKnjZ3EgpMbfkICDjQTD2/sFFvPJGJdSH6ttOgVsNGFgKiR5QtYl3Rl+rdYiWn26vZfgi/RVBrtPsh0B8OB7ePW9QU3ui61Y7G3MFoAOrcHSNqdMc91Zk9sgKj/+MBW1EHfYl UTwd19cm dNZl9qo6POpCz8/nUYIV8M7o4a4uSHMwerS0qrBwKkp7tBnMQJOkdtDYmewzczQ9BbDYfIuTCfsxWJUjdGxFOQAjrqokUjqsCyb2M8PINV8OU70LuSsFoMr859UEuRihTyzb7zdMoWFlzAcF2L8zdUVg1NCuafhsSY6gaArUciL3mWKItDCfarT1Pzeig348wwhNOHwinFC8G70pcerOp9Kn2iSK76nMiWNaAyg684DUSLrMaJBH93y1HZET3oJCDC7YrH9l0N6QVpFRCwJLjXxwPZ4cnz1B5lFZDimx8pOSxYzSqIXCiP9NI9jy7m2rfiRbLkEsBRVNtvBbgZN594xGg7U2sb+2o0pMeSKYiwY4AutHYwBwrGztXc2BWStu1zI2UagB9n6Fig3TuRDCEHaz/fw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/6/11 20:34, David Hildenbrand wrote: > On 05.06.25 10:00, Baolin Wang wrote: >> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, >> which >> means that even though we have disabled the Anon THP configuration, >> MADV_COLLAPSE >> will still attempt to collapse into a Anon THP. This violates the rule >> we have >> agreed upon: never means never. >> >> Another rule for madvise, referring to David's suggestion: “allowing >> for collapsing >> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine". >> >> To address this issue, should check whether the Anon THP configuration >> is disabled >> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is >> set. >> >> In summary, the current strategy is: >> >> 1. If always & orders == 0, and madvise & orders == 0, and >> hugepage_global_enabled() == false >> (global THP settings are not enabled), it means mTHP of that orders >> are prohibited >> from being used, then madvise_collapse() is forbidden for that orders. >> >> 2. If always & orders == 0, and madvise & orders == 0, and >> hugepage_global_enabled() == true >> (global THP settings are enabled), and inherit & orders == 0, it means >> mTHP of that >> orders are still prohibited from being used, thus madvise_collapse() >> is not allowed >> for that orders. >> >> Reviewed-by: Zi Yan >> Signed-off-by: Baolin Wang >> --- >>   include/linux/huge_mm.h | 23 +++++++++++++++++++---- >>   1 file changed, 19 insertions(+), 4 deletions(-) >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index 2f190c90192d..199ddc9f04a1 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct >> vm_area_struct *vma, >>                          unsigned long orders) >>   { >>       /* Optimization to check if required orders are enabled early. */ >> -    if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) { >> -        unsigned long mask = READ_ONCE(huge_anon_orders_always); >> +    if (vma_is_anonymous(vma)) { >> +        unsigned long always = READ_ONCE(huge_anon_orders_always); >> +        unsigned long madvise = READ_ONCE(huge_anon_orders_madvise); >> +        unsigned long inherit = READ_ONCE(huge_anon_orders_inherit); >> +        unsigned long mask = always | madvise; >> + >> +        /* >> +         * If the system-wide THP/mTHP sysfs settings are disabled, >> +         * then we should never allow hugepages. > > +         */> +        if (!(mask & orders) && > !(hugepage_global_enabled() && (inherit & orders))) >> +            return 0; > > I'm still trying to digest that. Isn't there a way for us to work with > the orders, > essentially masking off all orders that are forbidden globally. Similar > to below, if !orders, then return 0? > /* Orders disabled directly. */ > orders &= ~TODO; > /* Orders disabled by inheriting from the global toggle. */ > if (!hugepage_global_enabled()) >     orders &= ~READ_ONCE(huge_anon_orders_inherit); > > TODO is probably a -1ULL and then clearing always/madvise/inherit. Could > add a simple helper for that > > huge_anon_orders_never I followed Lorenzo's suggestion to simplify the logic. Does that look more readable? diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 2f190c90192d..3087ac7631e0 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, unsigned long tva_flags, unsigned long orders); +/* Strictly mask requested anonymous orders according to sysfs settings. */ +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags, + unsigned long tva_flags, unsigned long orders) +{ + unsigned long always = READ_ONCE(huge_anon_orders_always); + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise); + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit); + bool inherit_enabled = hugepage_global_enabled(); + bool has_madvise = vm_flags & VM_HUGEPAGE; + unsigned long mask = always | madvise; + + mask = always | madvise; + if (inherit_enabled) + mask |= inherit; + + /* All set to/inherit NEVER - never means never globally, abort. */ + if (!(mask & orders)) + return 0; + + /* + * Otherwise, we only enforce sysfs settings if asked. In addition, + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS + * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE + * set. + */ + if (!(tva_flags & TVA_ENFORCE_SYSFS)) + return orders; + + mask = always; + if (has_madvise) + mask |= madvise; + if (hugepage_global_always() || (has_madvise && inherit_enabled)) + mask |= inherit; + + return orders & mask; +} + /** * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma * @vma: the vm area to check @@ -287,19 +324,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, unsigned long orders) { /* Optimization to check if required orders are enabled early. */ - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) { - unsigned long mask = READ_ONCE(huge_anon_orders_always); - - if (vm_flags & VM_HUGEPAGE) - mask |= READ_ONCE(huge_anon_orders_madvise); - if (hugepage_global_always() || - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled())) - mask |= READ_ONCE(huge_anon_orders_inherit); - - orders &= mask; - if (!orders) - return 0; - } + if (vma_is_anonymous(vma)) + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders); return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders); }