From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0C1EC61DB2 for ; Mon, 9 Jun 2025 06:10:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8DABA6B0089; Mon, 9 Jun 2025 02:10:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 88AB46B008C; Mon, 9 Jun 2025 02:10:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 779D56B0092; Mon, 9 Jun 2025 02:10:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 55F646B0089 for ; Mon, 9 Jun 2025 02:10:21 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 04D17101576 for ; Mon, 9 Jun 2025 06:10:20 +0000 (UTC) X-FDA: 83534837442.26.6B7D885 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) by imf20.hostedemail.com (Postfix) with ESMTP id 05BDF1C0006 for ; Mon, 9 Jun 2025 06:10:17 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=E0q0ZyWA; spf=pass (imf20.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749449419; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oADAYTc5JVpZjMlpqCH2kSIiqG6Xq9PQGyijVjW42Gw=; b=jMLtrBRklmEgF18trTgATQOoAzUfd2m501frvgSIEIUyG4Q4g14FE56HTEjn8RwNTCvXOb QE1ou+my9trcN2XXD0oh3gDbYQJy8f0BMMfhbT66ucQjJ3WJLnZmrPhe7P1ohZPjZKyjCj 5xA5JeyUXYCgp/7pUFMIPTa3l4ABXHo= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=E0q0ZyWA; spf=pass (imf20.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749449419; a=rsa-sha256; cv=none; b=SEXVwWld7bxzTCdQ+PV1CYAdE63r6bIL+TRSKFu4lUtU7wkssLmB4e+r6C0Q4WkNOj+emP IUPQUjwL1c9yh6tXRuHVc2Nactkq2Ep5bLxPhu4Wch4cEkZCqgXYxi43KmRR57LjXvasxi RQcwItmlye0rwP1UsVOHbfUQtusvHNs= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1749449414; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=oADAYTc5JVpZjMlpqCH2kSIiqG6Xq9PQGyijVjW42Gw=; b=E0q0ZyWAAd64KQcbJeiuRASniI7TdXD0cqMYQs0pb742+MBYHP4bE5W9l1f1BtwJaSYd9bAvru7QvxbrCZDersxYUzrrYmuGJTUfs8XQgPS5EuBdQoywcM/euYGGQDBVuyczBVYOgwWFeuyU4gI8Jowm6Xjye9az5XV32Mnes1k= Received: from 30.74.144.144(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WdL9ZvM_1749449413 cluster:ay36) by smtp.aliyun-inc.com; Mon, 09 Jun 2025 14:10:13 +0800 Message-ID: Date: Mon, 9 Jun 2025 14:10:12 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled To: Lorenzo Stoakes Cc: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <8eefb0809c598fadaa4a022634fba5689a4f3257.1749109709.git.baolin.wang@linux.alibaba.com> <998a069c-9be5-4a10-888c-ba8269eaa333@lucifer.local> From: Baolin Wang In-Reply-To: <998a069c-9be5-4a10-888c-ba8269eaa333@lucifer.local> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: 05BDF1C0006 X-Rspamd-Server: rspam09 X-Stat-Signature: 5bup99ow674oenfry7mchhk34cdd71u4 X-HE-Tag: 1749449417-673156 X-HE-Meta: U2FsdGVkX1+Kg4atCIFcMLaYtyHJkWM1QeKgXORctxzlsYfEaQTln6SQLezCCXy94qwpvnbKBafmKbhtQmninXEDS3weqqASUxbM+vbrvs3aoM4xlq0jZw3ByuaYeXjpF7LkC7Kt/16j9oWyuMYoPzQDY9kKJGnLhEwLKUW0nHddTobKbyOAyfYHkvLK90qGuIs5RuDtM4dDlMV1q2pHG1uC/nCJ7aexM8/fX7VSzNtngzQ98GqFeTyzmJ935Iv1ZFa/Khu7wzjVXPopCP7v3y3s5fE9yGqEBLWxlUMVYEUWvk7dWU1bdxWxEsiglAgXx7y9EhM4kTVXdp0v/GzCY7tDxUn1DUuztv3d6N/2EcbZ05AqPNe3XYZfqnHZP99InOnoF9z/bCCv69+0TbdUJflcfoNB2yWY1wK2Vd3/jOvtTd4S7iC+UXVgWDkbn0hrsXhFsh9+lwO78Ey0hkYLVmVRgkMu7PM0FqwVKr3smSLniDPKxpT2JjxVdgYpNl1AA1dD4WENnfpXuh1AY+MMk/7fW/s+6v0BbgOuNBcOPQY98sKbQ+r+NaYEphudTjxnjWJN98GC1TztZSkBs3iPUZ7hcRePnljr/xcDmM9mWFK0hbmK8CAqKuwfUF3TRd40XYbC7acbRg7Wq7/volg729NAtVxXWTLCtPHtpnuorgulFcx7/4dv5zp5bj3zLP5oNC3WWwIMkdATn4PDMtVVZfxPK4mwmGGQ1DXyRRo2MLntANnQEY8o4Yi8k2urj2LA+xxExKz9nuiiyWHNGvsnGvZvBPIlPmu3pSgJJjgf9JOry6d5Gp/QetVkG+vu/4G3gYfOdWu2Qw1vwgRagl8snJSnA7pDGSx13L+j7tTbLmd1D3kWp/0TIBKd6ueBAnaJcZgnccbNlb1nzNJbk2dSFXEepeQRYc/0sn+TEtojaOfUXsLEeBAlIc7dgOSCFnloqexIx5/7xAYMPiBGsXJ zS5vNCnx Wrt53YRREVCP2CkRGn9w5UHhoInRMuXk3gt90wOJt/8v2hgmetKSXuu4N8jvfcbMZbEtA8M4rt2hq6BlvWXhyrOUlNL0Y5CQdIUUftHISvMZh/h1WbL8sfixZrcObJ7n5dP2Bz6Cc6BZGRy/3qBojuB4Rq525h/1xrgH0XUrSUdOuiezgrztMGZTBcEQWYVhyirZTKQwT0EdQJj60VB0TGkbTDeZlCKkyZe8JANFV9IMnQrHVsYSCGnUwkmwXV30hpDyC9mixIkvTHfartkKfbyDkZHhNE6r2fyJkk5GCG5Ps8iFuS6pbBXK/cuMxJ7t+pLYw2CsItye42462tBSvy3Q30Dfypk0DOkxyfFpLO44fTdxNGo8h+/tXD0ComOu40g6pwD74bwfd7y7jnaZdbREEtA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/6/7 19:55, Lorenzo Stoakes wrote: > Not related to your patch at all, but man this whole thing (thp allowed orders) > needs significant improvement, it seems always perversely complicated for a > relatively simple operation. > > Overall I LOVE what you're doing here, but I feel we can clarify things a > little while we're at it to make it clear exactly what we're doing. > > This is a very important change so forgive my fiddling about here but I'm > hoping we can take the opportunity to make things a little simpler! > > On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote: >> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which >> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE >> will still attempt to collapse into a Anon THP. This violates the rule we have >> agreed upon: never means never. >> >> Another rule for madvise, referring to David's suggestion: “allowing for collapsing >> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine". > > I'm generally not sure it's worth talking only about MADV_COLLAPSE here when > you're changing what THP is permitted across the board, I may have missed some > discussion and forgive me if so, but what is special about MADV_COLLAPSE's use > of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other > users? We found that MADV_COLLAPSE ignores the THP configuration, meaning that even when THP is set to 'never', MADV_COLLAPSE can still collapse into THPs (and mTHPs in the future). This is because when MADV_COLLAPSE calls thp_vma_allowable_orders(), it does not set the TVA_ENFORCE_SYSFS flag, which means it ignores the system-wide Anon THP sysfs settings. So this patch set is aimed to fix the THP policy for MADV_COLLAPSE. >> To address this issue, should check whether the Anon THP configuration is disabled >> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set. >> >> In summary, the current strategy is: >> >> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false >> (global THP settings are not enabled), it means mTHP of that orders are prohibited >> from being used, then madvise_collapse() is forbidden for that orders. >> >> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true >> (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that >> orders are still prohibited from being used, thus madvise_collapse() is not allowed >> for that orders. > > OK so it's already confusing that the global settings only impact 'inherit' > settings below, so they're not really global at all, but rather perhaps should > be called 'inherited'. > > Maybe I need to submit a patch to rename thp_inherited_enabled(), or perhaps > that'd just add to the confusion :P > > OK this is also not your fault just general commentary. > > Anyway, I feel points 1 and 2 can more succinctly be summed up as below, > also there's no need to refer to the code, it's actually clearer I think to > refer to the underlying logic: > > If no hugepage modes are enabled for the desired orders, nor can we > enable them by inheriting from a 'global' enabled setting - then it > must be the case that all desired orders either specify or inherit > 'NEVER' - and we must abort. OK. Thanks for helping me make it simpler:) >> >> Reviewed-by: Zi Yan >> Signed-off-by: Baolin Wang >> --- >> include/linux/huge_mm.h | 23 +++++++++++++++++++---- >> 1 file changed, 19 insertions(+), 4 deletions(-) >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index 2f190c90192d..199ddc9f04a1 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, >> unsigned long orders) >> { >> /* Optimization to check if required orders are enabled early. */ >> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) { >> - unsigned long mask = READ_ONCE(huge_anon_orders_always); >> + if (vma_is_anonymous(vma)) { >> + unsigned long always = READ_ONCE(huge_anon_orders_always); >> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise); >> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit); >> + unsigned long mask = always | madvise; >> + >> + /* >> + * If the system-wide THP/mTHP sysfs settings are disabled, >> + * then we should never allow hugepages. >> + */ >> + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders))) >> + return 0; >> + >> + if (!(tva_flags & TVA_ENFORCE_SYSFS)) >> + goto skip; >> >> + mask = always; >> if (vm_flags & VM_HUGEPAGE) >> - mask |= READ_ONCE(huge_anon_orders_madvise); >> + mask |= madvise; >> if (hugepage_global_always() || >> ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled())) >> - mask |= READ_ONCE(huge_anon_orders_inherit); >> + mask |= inherit; >> >> orders &= mask; >> if (!orders) >> return 0; >> } >> >> +skip: >> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders); >> } > > I feel this is compressing a lot of logic in a way that took me several > readings to understand (hey I might not be the smartest cookie in the jar, > but we need to account for all levels of kernel developer ;) > > I feel like we can make things a lot clearer here by separating out with a > helper function (means we can drop some indentation too), and also take > advantage of the fact that, if orders == 0, __thp_vma_allowable_orders() > exits with 0 early so no need for us to do so ourselves: Sure. Looks good to me. Thanks. > /* Strictly mask requested anonymous orders according to sysfs settings. */ > static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags, > unsigned long tva_flags, unsigned long orders) > { > unsigned long always = READ_ONCE(huge_anon_orders_always); > unsigned long madvise = READ_ONCE(huge_anon_orders_madvise); > unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);; > bool inherit_enabled = hugepage_global_enabled(); > bool has_madvise = vm_flags & VM_HUGEPAGE; > unsigned long mask = always | madvise; > > mask = always | madvise; > if (inherit_enabled) > mask |= inherit; > > /* All set to/inherit NEVER - never means never globally, abort. */ > if (!(mask & orders)) > return 0; > > /* Otherwise, we only enforce sysfs settings if asked. */ > if (!(tva_flags & TVA_ENFORCE_SYSFS)) > return orders; > > mask = always; > if (has_madvise) > mask |= madvise; > if (hugepage_global_always() || (has_madvise && inherit_enabled)) > mask |= inherit; > > return orders & mask; > } > > ... > > static inline > unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, > unsigned long vm_flags, > unsigned long tva_flags, > unsigned long orders) > { > if (vma_is_anonymous(vma)) > orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders); > > return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders); > }