From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36E46C61CE8 for ; Thu, 12 Jun 2025 13:40:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B1D7E6B0089; Thu, 12 Jun 2025 09:40:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ACE846B008A; Thu, 12 Jun 2025 09:40:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9BD786B008C; Thu, 12 Jun 2025 09:40:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7BCF96B0089 for ; Thu, 12 Jun 2025 09:40:40 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9002DBF1B6 for ; Thu, 12 Jun 2025 13:40:39 +0000 (UTC) X-FDA: 83546858598.08.149BF54 Received: from out30-110.freemail.mail.aliyun.com (out30-110.freemail.mail.aliyun.com [115.124.30.110]) by imf08.hostedemail.com (Postfix) with ESMTP id 1EB55160019 for ; Thu, 12 Jun 2025 13:40:34 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=Sf1ekAKy; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf08.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.110 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749735636; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7cyvgBhOFB4fSDeTdYe5OpcbaQcdxsp6HYp/N+SQ6y4=; b=hE+ObxnsaCvrUADfSWcowCoCOgb4+M60OGB3SZcgzfxtL1YvwpacSFDzXU0a263oaOsclo PQFlNEP18iMx7LmxYv+wyKg89i7+CYKpHtUEmVCS7wZ9iqhnaJiUpccBUI0BgQIwgj/+s3 lRE/m1zKmAEue2CcvnXr10mt4NCsuro= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749735636; a=rsa-sha256; cv=none; b=XiM+KX8wq7rV6b7yMgSYw+bhAijaezc122iyWgcv5QpmSe9vsY+jY9IaFD0wo0z9jYV680 E/y6335tC9Ai+/g1FihdqhBDSAHdi8HnmWV/CAl9woVwWGDUaW7CPx7sHeITJBl9ZEeRWz kWvPTALffKlDa8e7ZUub2xuXiT9NIo4= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=Sf1ekAKy; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf08.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.110 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1749735631; h=Message-ID:Date:MIME-Version:Subject:From:To:Content-Type; bh=7cyvgBhOFB4fSDeTdYe5OpcbaQcdxsp6HYp/N+SQ6y4=; b=Sf1ekAKyHxzdH2QMbHpYpThxi2xBJT9frmhzVuheMkXT3+TG1J0hlzOZNi7nZEmZSK0Dk69KIBI03R/Gh6F+ULbhlUMp4QnXTBe1bnqbWEK2V/kUIeki8vjZPEIMrHQdKPFvyuf7Dw4LwGI6V4sk8UcMYcfE8Bd4hc/NUXxTGeo= Received: from 30.39.247.88(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WdhB3FZ_1749735628 cluster:ay36) by smtp.aliyun-inc.com; Thu, 12 Jun 2025 21:40:28 +0800 Message-ID: <4511328c-cf70-4379-98d1-b66fd7b3d9fc@linux.alibaba.com> Date: Thu, 12 Jun 2025 21:40:27 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled From: Baolin Wang To: David Hildenbrand , akpm@linux-foundation.org, hughd@google.com Cc: lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <8eefb0809c598fadaa4a022634fba5689a4f3257.1749109709.git.baolin.wang@linux.alibaba.com> <1ec368c4-c4d8-41ea-b8a3-7d1fdb3ec358@redhat.com> <2ff65f37-efa9-4e96-9cdf-534d63ff154e@linux.alibaba.com> <953596b2-8749-493d-97eb-a5d8995d9ef8@redhat.com> <97a67b74-d473-455e-a05e-c85fe45da008@linux.alibaba.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: uyyon8ycyhbt5379r3p9enu7s5h9sabo X-Rspamd-Queue-Id: 1EB55160019 X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1749735634-130575 X-HE-Meta: U2FsdGVkX18MyjMVIQ7WorwMrWtYLYVDwdoaLS/OhU4Uan2GsG+KeS+q6np0I+wbO7xvGzgpGW0Alqc2QghA3WWSjNsc5xTF6xmb8OH2hSodWed78cQ/k5SFRSB/EuhlCYYeDR8Q8HrFTaxIOeS8qyaO8yHmT9CMKnW73nvnYpvv+o3GYg3OrcSrXaLwwq4+Qbjo4S4gIgz7Cy9NUXicJbq3RM0eIug4XcoqpQenx0zTlgzDXOTPsw1ms33voPhqFfn+PFmtaP2Ky7mhucPUwF8YET9JxchFg0nNTZmv7jQj2B4sz/oVqAV/R2khTlNi15/yLvkE5aD0+I9+XNinyXItIoKk6JxJyl8YJ67xfHTNCdG8ReIj4VRKJlO38C56UkWVcC1COZGknZFtzCvQndYJhhuLCftU+mJUrfhqUGOER3BWhPzaLCAz5wxV25Kz3FQZ7VL3hl3KWlFA6eCDp6lLLNPOpIEVoEeYK1yaJs3pBduJz2ruPxA6iINOBaPFnxSpwZjeUq4MomgH57dEJwVfsXS1FtRIGSDuCgHs/no6E9/FfU9tlWvpIIPqAE+8GrwWzX1oxtttAgeyZeaJv2Cc9KJrFi21uc1EsPaf88mJ00Hlz40nxtfiYwxgcLClUcLuT1CrDwzCSs2svPHfMUWwrx86GQjRRKzaQdcaPZYqDxh1VmSPSxvLWVLxaoaeuy31WNjpkBQxLk72WKP6QvG23yoI1+XCRNsbnB8ei08E+njM09QP2KhXxSALdsml0LplA2UrMxs/2JMBG20FFpt1tZUv0KTJpo/1CuA+1HDO12emVKEzQ0Besr/msz3dCU6LBBdTpuygxSbTNzu6wdvS50J4ITpgE6JbWKPL4+kL9c65br5F+uZhcaboSzIlYACce2d3YtaFUnPHeKZ+nQb58OFerYDKpme8agBAq+8ou6h0wL/f0uZrnN2eS1az7NA717KbBDfunoDUQ8G Uo48Cywy jluGetl0DcV+JZSvhLBrGYsM0WTmgu1K7ravohh5ciAaiW+2PuAOOFObCFaNxMrqsqMdI9uGXCra4enZFA5amuWQRHui8MmW008aAUCnSaRtmSPKgqSrre098GgnVJiOYNIbRFbw+wqWTJa+1ABGdddXUNpgNNviPZuclKBtRENLoaRGv/6B2nfAGwuiAFeYcbeUKHeKGMviom4VV5U2z+V0ZB59ENUTBRDA4tk2lCV8N1tpzOJapV0uvSPaA8p5/BHLekSjom6ouozAYdT5TaeR/nIXRad1W2C+/lg5nUfftt6Tjz3waqSW4EHl6E/pbbADot08xLEbB/0XOKtFTnZcnvwYGUpSBRhnFs48NciGxxMCaMwI63oVBpg0ADJ80mkK8y556qAwyNx9NqPbzofIzCA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/6/12 21:25, Baolin Wang wrote: > > > On 2025/6/12 21:05, David Hildenbrand wrote: >> On 12.06.25 14:45, Baolin Wang wrote: >>> >>> >>> On 2025/6/12 16:51, David Hildenbrand wrote: >>>> On 12.06.25 09:51, Baolin Wang wrote: >>>>> >>>>> >>>>> On 2025/6/11 20:34, David Hildenbrand wrote: >>>>>> On 05.06.25 10:00, Baolin Wang wrote: >>>>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs >>>>>>> settings, >>>>>>> which >>>>>>> means that even though we have disabled the Anon THP configuration, >>>>>>> MADV_COLLAPSE >>>>>>> will still attempt to collapse into a Anon THP. This violates the >>>>>>> rule >>>>>>> we have >>>>>>> agreed upon: never means never. >>>>>>> >>>>>>> Another rule for madvise, referring to David's suggestion: “allowing >>>>>>> for collapsing >>>>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine". >>>>>>> >>>>>>> To address this issue, should check whether the Anon THP >>>>>>> configuration >>>>>>> is disabled >>>>>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS >>>>>>> flag is >>>>>>> set. >>>>>>> >>>>>>> In summary, the current strategy is: >>>>>>> >>>>>>> 1. If always & orders == 0, and madvise & orders == 0, and >>>>>>> hugepage_global_enabled() == false >>>>>>> (global THP settings are not enabled), it means mTHP of that orders >>>>>>> are prohibited >>>>>>> from being used, then madvise_collapse() is forbidden for that >>>>>>> orders. >>>>>>> >>>>>>> 2. If always & orders == 0, and madvise & orders == 0, and >>>>>>> hugepage_global_enabled() == true >>>>>>> (global THP settings are enabled), and inherit & orders == 0, it >>>>>>> means >>>>>>> mTHP of that >>>>>>> orders are still prohibited from being used, thus madvise_collapse() >>>>>>> is not allowed >>>>>>> for that orders. >>>>>>> >>>>>>> Reviewed-by: Zi Yan >>>>>>> Signed-off-by: Baolin Wang >>>>>>> --- >>>>>>>     include/linux/huge_mm.h | 23 +++++++++++++++++++---- >>>>>>>     1 file changed, 19 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>>>>>> index 2f190c90192d..199ddc9f04a1 100644 >>>>>>> --- a/include/linux/huge_mm.h >>>>>>> +++ b/include/linux/huge_mm.h >>>>>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct >>>>>>> vm_area_struct *vma, >>>>>>>                            unsigned long orders) >>>>>>>     { >>>>>>>         /* Optimization to check if required orders are enabled >>>>>>> early. */ >>>>>>> -    if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) { >>>>>>> -        unsigned long mask = READ_ONCE(huge_anon_orders_always); >>>>>>> +    if (vma_is_anonymous(vma)) { >>>>>>> +        unsigned long always = READ_ONCE(huge_anon_orders_always); >>>>>>> +        unsigned long madvise = >>>>>>> READ_ONCE(huge_anon_orders_madvise); >>>>>>> +        unsigned long inherit = >>>>>>> READ_ONCE(huge_anon_orders_inherit); >>>>>>> +        unsigned long mask = always | madvise; >>>>>>> + >>>>>>> +        /* >>>>>>> +         * If the system-wide THP/mTHP sysfs settings are disabled, >>>>>>> +         * then we should never allow hugepages. >>>>>>    > +         */> +        if (!(mask & orders) && >>>>>> !(hugepage_global_enabled() && (inherit & orders))) >>>>>>> +            return 0; >>>>>> >>>>>> I'm still trying to digest that. Isn't there a way for us to work >>>>>> with >>>>>> the orders, >>>>>> essentially masking off all orders that are forbidden globally. >>>>>> Similar >>>>>> to below, if !orders, then return 0? >>>>>> /* Orders disabled directly. */ >>>>>> orders &= ~TODO; >>>>>> /* Orders disabled by inheriting from the global toggle. */ >>>>>> if (!hugepage_global_enabled()) >>>>>>        orders &= ~READ_ONCE(huge_anon_orders_inherit); >>>>>> >>>>>> TODO is probably a -1ULL and then clearing always/madvise/inherit. >>>>>> Could >>>>>> add a simple helper for that >>>>>> >>>>>> huge_anon_orders_never >>>>> >>>>> I followed Lorenzo's suggestion to simplify the logic. Does that look >>>>> more readable? >>>>> >>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>>>> index 2f190c90192d..3087ac7631e0 100644 >>>>> --- a/include/linux/huge_mm.h >>>>> +++ b/include/linux/huge_mm.h >>>>> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct >>>>> vm_area_struct *vma, >>>>>                                             unsigned long tva_flags, >>>>>                                             unsigned long orders); >>>>> >>>>> +/* Strictly mask requested anonymous orders according to sysfs >>>>> settings. */ >>>>> +static inline unsigned long __thp_mask_anon_orders(unsigned long >>>>> vm_flags, >>>>> +                               unsigned long tva_flags, unsigned long >>>>> orders) >>>>> +{ >>>>> +       unsigned long always = READ_ONCE(huge_anon_orders_always); >>>>> +       unsigned long madvise = READ_ONCE(huge_anon_orders_madvise); >>>>> +       unsigned long inherit = READ_ONCE(huge_anon_orders_inherit); >>>>> +       bool inherit_enabled = hugepage_global_enabled(); >>>>> +       bool has_madvise =  vm_flags & VM_HUGEPAGE; >>>>> +       unsigned long mask = always | madvise; >>>>> + >>>>> +       mask = always | madvise; >>>>> +       if (inherit_enabled) >>>>> +               mask |= inherit; >>>>> + >>>>> +       /* All set to/inherit NEVER - never means never globally, >>>>> abort. */ >>>>> +       if (!(mask & orders)) >>>>> +               return 0; >>>> >>>> Still confusing. I am not sure if we would properly catch when someone >>>> specifies e.g., 2M and 1M, while we only have 2M disabled. >>> >>> IIUC, Yes. In your case, we will only allow order 8 (1M mTHP). >>> >>>> I would rewrite the function to only ever substract from "orders". >>>> >>>> ... >>>> >>>> /* Disallow orders that are set to NEVER directly ... */ >>>> order &= (always | madvise | inherit); >>>> >>>> /* ... or through inheritance. */ >>>> if (inherit_enabled) >>>>       orders &= ~inherit; >>> >>> Sorry, I didn't get you here. >>> >>> If orders = THP_ORDERS_ALL_ANON, inherit = 0x200 (order 9), always and >>> madvise are 0, and inherit_enabled = true. Then orders will be 0 with >>> your logic. But we should allow order 9, right? >> >> Yeah, all confusing, because the temporary variables don't help. >> >> if (!inherit_enabled) >> >> or simply >> >> if (!hugepage_global_enabled();) >> >> Let me try again below. >> >>> >>>> >>>> /* >>>>    * Otherwise, we only enforce sysfs settings if asked. In addition, >>>>    * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS >>>>    * is not set, we don't bother checking whether the VMA has >>>> VM_HUGEPAGE >>>>    * set. >>>>    */ >>>> if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS)) >>>>       return orders; >>>> >>>>> + >>>>> +       /* >>>>> +        * Otherwise, we only enforce sysfs settings if asked. In >>>>> addition, >>>>> +        * if the user sets a sysfs mode of madvise and if >>>>> TVA_ENFORCE_SYSFS >>>>> +        * is not set, we don't bother checking whether the VMA has >>>>> VM_HUGEPAGE >>>>> +        * set. >>>>> +        */ >>>>> +       if (!(tva_flags & TVA_ENFORCE_SYSFS)) >>>>> +               return orders; >>>>> + >>>>> +       mask = always; >>>>> +       if (has_madvise) >>>>> +               mask |= madvise; >>>>> +       if (hugepage_global_always() || (has_madvise && >>>>> inherit_enabled)) >>>>> +               mask |= inherit; >>>> >>>> Similarly, this can maybe become (not 100% sure if I got it right, the >>>> condition above is confusing) >>> >>> IMO, this is the original logic. >> >> Yeah, and it's absolutely confusing stuff. >> >> Let me try again by only clearing flags. Maybe this would be clearer? >> (and correct? still confused why the latter part is so complicated in >> existing >> code) >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index 8b8f353cc7b81..66fdfe06e4996 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct >> vm_area_struct *vma, >>                                           unsigned long tva_flags, >>                                           unsigned long orders); >> >> +/* Strictly mask requested anonymous orders according to sysfs >> settings. */ >> +static inline unsigned long __thp_mask_anon_orders(unsigned long >> vm_flags, >> +       unsigned long tva_flags, unsigned long orders) >> +{ >> +       const unsigned long always = READ_ONCE(huge_anon_orders_always); >> +       const unsigned long madvise = >> READ_ONCE(huge_anon_orders_madvise); >> +       const unsigned long inherit = >> READ_ONCE(huge_anon_orders_inherit); >> +       const unsigned long never = ~(always | madvise | inherit); >> + >> +       /* Disallow orders that are set to NEVER directly ... */ >> +       orders &= ~never; >> + >> +       /* ... or through inheritance (global == NEVER). */ >> +       if (!hugepage_global_enabled()) >> +               orders &= ~inherit; >> + >> +       /* >> +        * Otherwise, we only enforce sysfs settings if asked. In >> addition, >> +        * if the user sets a sysfs mode of madvise and if >> TVA_ENFORCE_SYSFS >> +        * is not set, we don't bother checking whether the VMA has >> VM_HUGEPAGE >> +        * set. >> +        */ >> +       if (!(tva_flags & TVA_ENFORCE_SYSFS)) >> +               return orders; >> + >> +       if (!(vm_flags & VM_HUGEPAGE)) { >> +               /* Disallow orders that are set to MADVISE directly >> ... */ >> +               orders &= ~madvise; >> + >> +               /* ... or through inheritance (global == MADVISE). */ >> +               if (!hugepage_global_always()) >> +                       orders &= ~inherit; > > Seems we can drop this 'inherit' check, cause if > !hugepage_global_enabled() == true, which always means > !hugepage_global_always() is true. Please ignore this incorrect comment. Logic is confusing :)