From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 205F7C71136 for ; Thu, 12 Jun 2025 13:25:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 95A3F6B007B; Thu, 12 Jun 2025 09:25:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 90AF56B0088; Thu, 12 Jun 2025 09:25:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 820B96B0089; Thu, 12 Jun 2025 09:25:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 621B86B007B for ; Thu, 12 Jun 2025 09:25:54 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0BB785F7BF for ; Thu, 12 Jun 2025 13:25:54 +0000 (UTC) X-FDA: 83546821428.09.4C8BBCC Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf19.hostedemail.com (Postfix) with ESMTP id 2A8461A0014 for ; Thu, 12 Jun 2025 13:25:49 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=s9xy4l3a; spf=pass (imf19.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749734752; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vYF/r51uFCbhClrqwieaK0VW0rIZWeBxCvWde2d0ef4=; b=HNvtZZIan7IRCMSTvKteIPMPe4dH40sJlqHarA1CMx2m4mp3fARi4hPQMxKE246oJTuWGj gCntNONrM3qNV6a9AVKC5MkgRSFHqGDVVhfXUertrBPXAc/5tO4R7lCXoivBEOsjf4wMpU TgKPsEqKmYJ/3h+Z9voGoLtuDdQSuIg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749734752; a=rsa-sha256; cv=none; b=Gq3MjqqI7ZBFjYUGo7LZbG4tmfj7wzZjSUogvysmTgzIkrOFqVSqj7k5mTjqLRvx44CkTO w/rVmmHDmAvICFpP+TduMIXLIWJ/DG5FDgcP4yvjozMSIZI9/7RWfwniVQA7EEK1SUDb4a CZaGWTnjA8HCLLRAJ6u4W+xDYv0SXD4= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=s9xy4l3a; spf=pass (imf19.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1749734747; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=vYF/r51uFCbhClrqwieaK0VW0rIZWeBxCvWde2d0ef4=; b=s9xy4l3a7yE9jXXePTfMCU8xzSbLe4pCHC2b48uyu3cxknx+PBKX6oDTINV8bTZ9Y1D4gBXKmxt2M2odc+9lsusLzZVCjI3p3svo1BSJVVtV7axZknauPXhLL6QuXbBL2zeCjWlRFdqsBLTF48dkjXS4kxzRwtdx9bR8eAC11jM= Received: from 30.39.247.88(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WdhBJVR_1749734743 cluster:ay36) by smtp.aliyun-inc.com; Thu, 12 Jun 2025 21:25:44 +0800 Message-ID: Date: Thu, 12 Jun 2025 21:25:43 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled To: David Hildenbrand , akpm@linux-foundation.org, hughd@google.com Cc: lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <8eefb0809c598fadaa4a022634fba5689a4f3257.1749109709.git.baolin.wang@linux.alibaba.com> <1ec368c4-c4d8-41ea-b8a3-7d1fdb3ec358@redhat.com> <2ff65f37-efa9-4e96-9cdf-534d63ff154e@linux.alibaba.com> <953596b2-8749-493d-97eb-a5d8995d9ef8@redhat.com> <97a67b74-d473-455e-a05e-c85fe45da008@linux.alibaba.com> From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2A8461A0014 X-Stat-Signature: tt5qgkhcftb4mecoec4deurkdex545rj X-HE-Tag: 1749734749-970386 X-HE-Meta: U2FsdGVkX18ExWKJ/yxQLQdx2GvcH8ShQkXbLy9vM8qRiQESlsasog7xy2JXv3cyNOUYOBd3NOIrSA6nhxIAiEBn/qhEkdrz1Y3B2McSe7eTO7v5/IVUVEf+xw+MAkSQp4uYdx2PEI0ABF0Kd6cfGsC8Um5BKOo1o0S7814J/5+uMw4c3qXUoNB+S/C2oBI8CQWAtasH5VKqlkQj2GEigglcli7w9MtjwiHl4RsEKMsZwPWIRjIyicvlXqKScjHXIRX4FViz7Kd2DtpE8TO90nJ7OiEqEeyTxTcD9WSp27fxX1cJtZNj3SFhFEQv4YuHgx5UgYfMlndrs8GH7NmLT+4LzoS+YFwgiF2YzPyXKA00lp7SYWTp1romiVvISq+U6LJCU+iM8E+ebtBrQINSkr1AfPxi/xjsD0KQhfg3AcS6a6ZP5BZwRMG3drWqKrisObJCG7gmxxeaL7nD2b5+VQlRnqffJHX61JEpNXNqtHqxVZ/THNmfqHLKGBUDGqnYWZCICYn/51u3e5w2xk79TsNr62s15nKgr0ZqLEOg7LEtOYxcHFM4DkT9iGaMdAGb60DqvzwvXVnUjPPkMAlhug/m+H33EgV+oYsyggRGfkffa2BpoJcInhfOAhgSeK4nPBO1GImua45VzrSRgwQq7YtqmqKAgFw/LU5oTKTPsYFqzUj9bHOC92EEuG3Wf7PfCHbF+jbduz8U897y83DGzTK1R9FXxmNotIlZMHOVBjDIwAz6Tm31teT2AFtnHMuMve2FUTVCYMWQvYHZH+zpWO2HPdnBJxZ7+Ni3e+LVIROPHong7BGKcqag6Zwjks3Gm/WIjNqUodG7sE+GtIK55mVYAAanVFvCd5Cosbnte2+BG6d/jcMq7RU4fnxHKkLS97Qm+1or59NXZj1A3J2FmxMr1pMGueThNP4NInxB3/onF9nc23OgY7LAj3XkmKa/MljrhG169XSzVlFpoz/ 0Q+I17BD JySeQd/HdY1rhQGTkk4pP7NUt/vli3h9kETWCcBtqjAGaHTkDLN3lD98eEbrdyx9nYjQGxWeGk2ceh1TBkjzAHC4euAn8WJxWSG/DR+xWc7AqmjKm3Ok94DdRhRaHf+khbkB/PhjcDpOq3TvetZcPtZT5ZigNIuQpf8Y2JkwMnB93eltNJFGGQma6b7AblouXu3oxXDBZ5fwBol5cAHD9/DTV04nP0/RA9h4tXlb+N4Vl049GtaZ9+bKec6oP0hSmJMf9tq0JMzILV6wL4ZXkeMkvTSPxYYZzwyA2EsVsNK8wm8uPD4g/njVQdqseB7xJuFzZhg3UjxQXCEj0K+A2vGE/IsOLhOQujRfKdoGJfrwqrmrfvhIGXYATINoV7niSR/Zxx/thfqziZwfdwaJKr+UpLQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/6/12 21:05, David Hildenbrand wrote: > On 12.06.25 14:45, Baolin Wang wrote: >> >> >> On 2025/6/12 16:51, David Hildenbrand wrote: >>> On 12.06.25 09:51, Baolin Wang wrote: >>>> >>>> >>>> On 2025/6/11 20:34, David Hildenbrand wrote: >>>>> On 05.06.25 10:00, Baolin Wang wrote: >>>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs >>>>>> settings, >>>>>> which >>>>>> means that even though we have disabled the Anon THP configuration, >>>>>> MADV_COLLAPSE >>>>>> will still attempt to collapse into a Anon THP. This violates the >>>>>> rule >>>>>> we have >>>>>> agreed upon: never means never. >>>>>> >>>>>> Another rule for madvise, referring to David's suggestion: “allowing >>>>>> for collapsing >>>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine". >>>>>> >>>>>> To address this issue, should check whether the Anon THP >>>>>> configuration >>>>>> is disabled >>>>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS >>>>>> flag is >>>>>> set. >>>>>> >>>>>> In summary, the current strategy is: >>>>>> >>>>>> 1. If always & orders == 0, and madvise & orders == 0, and >>>>>> hugepage_global_enabled() == false >>>>>> (global THP settings are not enabled), it means mTHP of that orders >>>>>> are prohibited >>>>>> from being used, then madvise_collapse() is forbidden for that >>>>>> orders. >>>>>> >>>>>> 2. If always & orders == 0, and madvise & orders == 0, and >>>>>> hugepage_global_enabled() == true >>>>>> (global THP settings are enabled), and inherit & orders == 0, it >>>>>> means >>>>>> mTHP of that >>>>>> orders are still prohibited from being used, thus madvise_collapse() >>>>>> is not allowed >>>>>> for that orders. >>>>>> >>>>>> Reviewed-by: Zi Yan >>>>>> Signed-off-by: Baolin Wang >>>>>> --- >>>>>>     include/linux/huge_mm.h | 23 +++++++++++++++++++---- >>>>>>     1 file changed, 19 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>>>>> index 2f190c90192d..199ddc9f04a1 100644 >>>>>> --- a/include/linux/huge_mm.h >>>>>> +++ b/include/linux/huge_mm.h >>>>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct >>>>>> vm_area_struct *vma, >>>>>>                            unsigned long orders) >>>>>>     { >>>>>>         /* Optimization to check if required orders are enabled >>>>>> early. */ >>>>>> -    if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) { >>>>>> -        unsigned long mask = READ_ONCE(huge_anon_orders_always); >>>>>> +    if (vma_is_anonymous(vma)) { >>>>>> +        unsigned long always = READ_ONCE(huge_anon_orders_always); >>>>>> +        unsigned long madvise = READ_ONCE(huge_anon_orders_madvise); >>>>>> +        unsigned long inherit = READ_ONCE(huge_anon_orders_inherit); >>>>>> +        unsigned long mask = always | madvise; >>>>>> + >>>>>> +        /* >>>>>> +         * If the system-wide THP/mTHP sysfs settings are disabled, >>>>>> +         * then we should never allow hugepages. >>>>>    > +         */> +        if (!(mask & orders) && >>>>> !(hugepage_global_enabled() && (inherit & orders))) >>>>>> +            return 0; >>>>> >>>>> I'm still trying to digest that. Isn't there a way for us to work with >>>>> the orders, >>>>> essentially masking off all orders that are forbidden globally. >>>>> Similar >>>>> to below, if !orders, then return 0? >>>>> /* Orders disabled directly. */ >>>>> orders &= ~TODO; >>>>> /* Orders disabled by inheriting from the global toggle. */ >>>>> if (!hugepage_global_enabled()) >>>>>        orders &= ~READ_ONCE(huge_anon_orders_inherit); >>>>> >>>>> TODO is probably a -1ULL and then clearing always/madvise/inherit. >>>>> Could >>>>> add a simple helper for that >>>>> >>>>> huge_anon_orders_never >>>> >>>> I followed Lorenzo's suggestion to simplify the logic. Does that look >>>> more readable? >>>> >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>>> index 2f190c90192d..3087ac7631e0 100644 >>>> --- a/include/linux/huge_mm.h >>>> +++ b/include/linux/huge_mm.h >>>> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct >>>> vm_area_struct *vma, >>>>                                             unsigned long tva_flags, >>>>                                             unsigned long orders); >>>> >>>> +/* Strictly mask requested anonymous orders according to sysfs >>>> settings. */ >>>> +static inline unsigned long __thp_mask_anon_orders(unsigned long >>>> vm_flags, >>>> +                               unsigned long tva_flags, unsigned long >>>> orders) >>>> +{ >>>> +       unsigned long always = READ_ONCE(huge_anon_orders_always); >>>> +       unsigned long madvise = READ_ONCE(huge_anon_orders_madvise); >>>> +       unsigned long inherit = READ_ONCE(huge_anon_orders_inherit); >>>> +       bool inherit_enabled = hugepage_global_enabled(); >>>> +       bool has_madvise =  vm_flags & VM_HUGEPAGE; >>>> +       unsigned long mask = always | madvise; >>>> + >>>> +       mask = always | madvise; >>>> +       if (inherit_enabled) >>>> +               mask |= inherit; >>>> + >>>> +       /* All set to/inherit NEVER - never means never globally, >>>> abort. */ >>>> +       if (!(mask & orders)) >>>> +               return 0; >>> >>> Still confusing. I am not sure if we would properly catch when someone >>> specifies e.g., 2M and 1M, while we only have 2M disabled. >> >> IIUC, Yes. In your case, we will only allow order 8 (1M mTHP). >> >>> I would rewrite the function to only ever substract from "orders". >>> >>> ... >>> >>> /* Disallow orders that are set to NEVER directly ... */ >>> order &= (always | madvise | inherit); >>> >>> /* ... or through inheritance. */ >>> if (inherit_enabled) >>>       orders &= ~inherit; >> >> Sorry, I didn't get you here. >> >> If orders = THP_ORDERS_ALL_ANON, inherit = 0x200 (order 9), always and >> madvise are 0, and inherit_enabled = true. Then orders will be 0 with >> your logic. But we should allow order 9, right? > > Yeah, all confusing, because the temporary variables don't help. > > if (!inherit_enabled) > > or simply > > if (!hugepage_global_enabled();) > > Let me try again below. > >> >>> >>> /* >>>    * Otherwise, we only enforce sysfs settings if asked. In addition, >>>    * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS >>>    * is not set, we don't bother checking whether the VMA has >>> VM_HUGEPAGE >>>    * set. >>>    */ >>> if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS)) >>>       return orders; >>> >>>> + >>>> +       /* >>>> +        * Otherwise, we only enforce sysfs settings if asked. In >>>> addition, >>>> +        * if the user sets a sysfs mode of madvise and if >>>> TVA_ENFORCE_SYSFS >>>> +        * is not set, we don't bother checking whether the VMA has >>>> VM_HUGEPAGE >>>> +        * set. >>>> +        */ >>>> +       if (!(tva_flags & TVA_ENFORCE_SYSFS)) >>>> +               return orders; >>>> + >>>> +       mask = always; >>>> +       if (has_madvise) >>>> +               mask |= madvise; >>>> +       if (hugepage_global_always() || (has_madvise && >>>> inherit_enabled)) >>>> +               mask |= inherit; >>> >>> Similarly, this can maybe become (not 100% sure if I got it right, the >>> condition above is confusing) >> >> IMO, this is the original logic. > > Yeah, and it's absolutely confusing stuff. > > Let me try again by only clearing flags. Maybe this would be clearer? > (and correct? still confused why the latter part is so complicated in > existing > code) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 8b8f353cc7b81..66fdfe06e4996 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct > vm_area_struct *vma, >                                          unsigned long tva_flags, >                                          unsigned long orders); > > +/* Strictly mask requested anonymous orders according to sysfs > settings. */ > +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags, > +       unsigned long tva_flags, unsigned long orders) > +{ > +       const unsigned long always = READ_ONCE(huge_anon_orders_always); > +       const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise); > +       const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit); > +       const unsigned long never = ~(always | madvise | inherit); > + > +       /* Disallow orders that are set to NEVER directly ... */ > +       orders &= ~never; > + > +       /* ... or through inheritance (global == NEVER). */ > +       if (!hugepage_global_enabled()) > +               orders &= ~inherit; > + > +       /* > +        * Otherwise, we only enforce sysfs settings if asked. In addition, > +        * if the user sets a sysfs mode of madvise and if > TVA_ENFORCE_SYSFS > +        * is not set, we don't bother checking whether the VMA has > VM_HUGEPAGE > +        * set. > +        */ > +       if (!(tva_flags & TVA_ENFORCE_SYSFS)) > +               return orders; > + > +       if (!(vm_flags & VM_HUGEPAGE)) { > +               /* Disallow orders that are set to MADVISE directly ... */ > +               orders &= ~madvise; > + > +               /* ... or through inheritance (global == MADVISE). */ > +               if (!hugepage_global_always()) > +                       orders &= ~inherit; Seems we can drop this 'inherit' check, cause if !hugepage_global_enabled() == true, which always means !hugepage_global_always() is true. > +       } > +       return orders; > +} Otherwise look good to me. > + >  /** >   * thp_vma_allowable_orders - determine hugepage orders that are > allowed for vma >   * @vma:  the vm area to check > @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct > vm_area_struct *vma, >                                        unsigned long orders) >  { >         /* Optimization to check if required orders are enabled early. */ > -       if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) { > -               unsigned long mask = READ_ONCE(huge_anon_orders_always); > - > -               if (vm_flags & VM_HUGEPAGE) > -                       mask |= READ_ONCE(huge_anon_orders_madvise); > -               if (hugepage_global_always() || > -                   ((vm_flags & VM_HUGEPAGE) && > hugepage_global_enabled())) > -                       mask |= READ_ONCE(huge_anon_orders_inherit); > - > -               orders &= mask; > +       if (vma_is_anonymous(vma)) { > +               orders = __thp_mask_anon_orders(vm_flags, tva_flags, > orders); >                 if (!orders) >                         return 0; >         } > >