From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92263C7115B for ; Mon, 23 Jun 2025 11:08:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 032056B00B3; Mon, 23 Jun 2025 07:08:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 009FD6B00B9; Mon, 23 Jun 2025 07:08:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E89486B00BA; Mon, 23 Jun 2025 07:08:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D96E26B00B3 for ; Mon, 23 Jun 2025 07:08:50 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 900971A0726 for ; Mon, 23 Jun 2025 11:08:50 +0000 (UTC) X-FDA: 83586392820.17.86C6ED1 Received: from mail-ua1-f47.google.com (mail-ua1-f47.google.com [209.85.222.47]) by imf03.hostedemail.com (Postfix) with ESMTP id AB8F62000D for ; Mon, 23 Jun 2025 11:08:48 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KYLyu7PL; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.47 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750676928; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VzozNzmf7Y1xzq4oghrQd3A9wnue0xiAAusPU/HA3J4=; b=gMwUtS4O/9QC785308fVQPLtI77kJQ6cCPjXEVL3cYlTC+Mp6OJBGwMBhb71vAXQTQL8cR jATvzo9ppzpuJxrCBe1z+6AQPkLa6/LP90826X3a5Xt5m8nJhQ5OOOjpVJT2kKtWdNsXmx 01f9/oN8MQhe8q+8EID4xT81o1C1qmA= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KYLyu7PL; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.47 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750676928; a=rsa-sha256; cv=none; b=I96cz9YSIG8LFXmAs5jMiv+q7f8Uhjsx2PEGo53ZDmZ3SbJCrtlW8ryK3vHKJBJNtS+5jd vg5llVBPuTjS8Q5DOAo+d7AgqwK90YV9D9200Ggupt6QVX+kSbixiVDUTCjamTJ0kPW1Um 0tVZdw/2TaDYm2VRZCpEdtd9xhDnQBs= Received: by mail-ua1-f47.google.com with SMTP id a1e0cc1a2514c-87f2aed4092so822201241.2 for ; Mon, 23 Jun 2025 04:08:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750676927; x=1751281727; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VzozNzmf7Y1xzq4oghrQd3A9wnue0xiAAusPU/HA3J4=; b=KYLyu7PLYSRS1PCNavCVtyNSYhLB3CLtO3RfY2kxR3oGXDFMgxcKXNqEasQZIB3/jo Ge3w8iEHUkrc75GW6X3OHxZ0Io/QEPyb8Hi56mSKC7oalFvAL4E+U4PDAyTzjNkbZ+dT TzRJ0bJlrqWf1xR/35hrjKITwYPu/AftDKrco4Rnh1oSR95/F9M6PvqtC5vK+tYUt7k8 SPqOGNQjbANpUM1ObwMqqij6l6NYlrqpHiT1Nyl2U/5kadwh/Vta6rkYbwHod3fSd4M6 Iyxbtak4e0w6Af/hq7E4ZJutr3pXKyRH5YLvRN5aorudfrzGLlcpQnTYb38WteiBC9yv Nx7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750676927; x=1751281727; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VzozNzmf7Y1xzq4oghrQd3A9wnue0xiAAusPU/HA3J4=; b=pUlBLOURNpArIaXL1Aj/lBBsPqxOe7NiUwkbMeb8WbfrHpEKxhumN3pMWCjHBrR7M+ yIq6vux5kw8qWuCZaulkeK3wNqv0ZuBAViTbLfRVXw/Vw1shGBOUUdQdmB36ZSlYUYh5 1MYG2piZHBuAIPKODEfUeIp7TnhI3qSvNjX7odKke+GPa9416ccyalkY3Mu+hhDN+L58 VC9mvmZ1qoJ8+Hlu7Rk8C+64IfLt8aza1CHG/pM6Debl0qo6u3KWblPwSTJiEWF2/w8h /YM4CscHVe5Bbh0h4vauHsR2GDP2Y+JlGTB05SmlByTayCVIduk8XeaPHNB0IlbUcRwv mzYA== X-Forwarded-Encrypted: i=1; AJvYcCVa3dcQQl9QZ5JBm9UJq2fz4CyrajWuu6El/6chc3chymBexRtGxHaIvDefJCE57g5fzB/2tFqlLA==@kvack.org X-Gm-Message-State: AOJu0Yx/xu0/rDEg/XshrNVSC1P1eL3yrtdw5vRcoYUv0z3Prtcmg93B ZRTm2o7sBPOkDSXbjXg2uei9hk51dCMcYcxsIZ6AOSt87YhSu5+o2VbpUR5KNEMNGSs5wq4gcZN tqIc9tg0s+teV65wRcZNA1LKeUG+fjDA= X-Gm-Gg: ASbGnctp3I0HOTtLE3705rGkwoXcIsJoxeueXW/R14dmviI/dCP66BWLddadJDuP5k9 LXMwB+umTwR+7haIjNPXavcMoVmKehNxF3qgah/XQ7aX1ch1oEHldDgbfeZmFtcP3JGidzkb4RN H92mhUASe5BCrrjSjbBQ4DVQVNO4vVSYASJX8OHlzNA4g= X-Google-Smtp-Source: AGHT+IEUli3Qpg8n1WGNQ1rIyR8Ht70uu5sWM21vjDX/FAHUjqL71ew3oAV2vQS8r83W4sf/G2QjQ1OdWaR9rBabaAw= X-Received: by 2002:a05:6122:16a7:b0:531:3af8:b177 with SMTP id 71dfb90a1353d-531ad77ca02mr6537151e0c.11.1750676927494; Mon, 23 Jun 2025 04:08:47 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Mon, 23 Jun 2025 23:08:36 +1200 X-Gm-Features: Ac12FXz_fXm01DVzp5yQUptWQUEuKVwpYnCA2BGOS39uCBBgTt8AZ8J_w_5irr0 Message-ID: Subject: Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled To: Baolin Wang Cc: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, ziy@nvidia.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: AB8F62000D X-Stat-Signature: o61n9zybuph6xnro177jnhdomehmi74e X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1750676928-600568 X-HE-Meta: U2FsdGVkX1825gvd3zIBGFfhDN4v+jJlkV/uPK2s0uN16MN39wY6mAaMr07vq+3+kvGH9LifaRLJfwGOqj+oW12CxGNh7HXsuLXTke1E4mdE6F+hjTlHlv5vnXy42D9gepxnAgUB/Rr4K3EuT+yjGFCJu1bOuV2tyEgQX0nIWrUi5P/v5hM4BCoaWMpGBubDbTioxm0uvXeJVWJ9nih2x1+09sWyU1I2anqm3uhg7SUTn+MuSrs8bxzT3ZeRS8oTIXkEtCo5vr2hcczMaB5+w68STD4PQg2qTKY6fu4RfW5lhBFj6g8odaadJQDRLQ1D5F/pmF/c8lXY1A20KPvIJf849Tmx10Ty3DSBUgNoku1VXWqxGaNFYj3kQOtvQZBr7UY99E/dz46t2PyhYQmyaWmF1rRQe0jlH6ihPWPJbkyYzEnlJ5E8O6NBejo/2lRvulpW384qlEVifScHIHa0LVjUqxdtscra3afhKuCcYlOE/VVZiJadHI0sWO/Ag0Fr5qKvZybUigLZBCTyQhCGh0O7GPlNCBmICT/q7fqYTAP12stvrALP0QSV6QzBdmJYRqtz+kP7lHAMtAUPkYmtOyPTb2NWPgN6UsBAVCw/t508bsyx90IaOw9w8oPlfYaysR3+vHf1QlaMiW+evNRObNt4L+8ZfqLG2xSKH/ucpTnFBxY0T3jMfyktYRRcpOe628g1E/7tIgLUm4Zkse9VojYIddXVsiLam7hs/Cbu1q3cuKw4bcL8iy+uiMV8vZbnAdMQxU5wuh93TlZY5AXlWafhcNXG1s2LCjh19Yvc2sh+lzuW1jWOHbDn8KLXX314VXTR457SzKfmZvKCTJqQfN8PHYBB7TjV4kpQUw5Y71Q0eb6B7nPP4s4KFa3Imu7aTaYaoNpZ5WEi5uuxXAyh3WyaxIHG38Wb4WHGApNm7Y4TXSPxz0bFyBWtXgVGKIaU7xB8cfAojbbQvojrhJK vf3bGcEO QMvxKqzr0rHqmEVnpaEIIxyBnHum7W5b7udpkE3/1kKPPCFGr5spVqok+Hsre7SVYbFSGuMNYnNlCqs0JxBsEHjaW39x5nOxURlbgKMxwP0Qpjlnhduf8dE8XQExV5m8f5tPtd2k9wTptv5v2y/kftcyrsY9SpfugTLfRhuFNKW464z0TCjCKBejf4JTIVN6pVVJRlkp5JVIra/XtHhw8QzVFZfAhwcie9cKTH+TfnaoAu9BvPnp4KgVHTgWg3seA1K1swTF6XcfOOE+JwmQ/8im943ZcQXRHBPIICMPD/M5DvJqwes1SPeRLZYIe8bEpyPr7kV+Ay3Y4LjmgWwImwLlD9XJ/QrTRjSGEAbzWKBaVQygzK5wXy0OA1PT479E16FCcqcis/BI+ferwPyKJMkk3VGCUso2VTry2hh5Rn5+ot86pLOi23ywTy9QLz2/XEO0WlGRI0cGDYkuSGma8yBwXOhcmGBhIjz3gs+pFvxMmx59FyYCnvBc84A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 23, 2025 at 8:28=E2=80=AFPM Baolin Wang wrote: > > When invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS flag is n= ot > specified, we will ignore the THP sysfs settings. Whilst it makes sense f= or the > callers who do not specify this flag, it creates a odd and surprising sit= uation > where a sysadmin specifying 'never' for all THP sizes still observing THP= pages > being allocated and used on the system. > > The motivating case for this is MADV_COLLAPSE. The MADV_COLLAPSE will ign= ore > the system-wide Anon THP sysfs settings, which means that even though we = have > disabled the Anon THP configuration, MADV_COLLAPSE will still attempt to = collapse > into a Anon THP. This violates the rule we have agreed upon: never means = never. > Should we update the man page for madv_collapse ? https://man7.org/linux/man-pages/man2/madvise.2.html MADV_COLLAPSE is independent of any sysfs (see sysfs(5)) setting under /sys/kernel/mm/transparent_hugepage, both in terms of determining THP eligibility, and allocation semantics. See Linux kernel source file Documentation/admin-guide/mm/transhuge.rst for more information. MADV_COLLAPSE also ignores huge=3D tmpfs mount when operating on tmpfs files. Allocation for the new hugepage may enter direct reclaim and/or compaction, regardless of VMA flags (though VM_NOHUGEPAGE is still respected). So this effectively changes the uABI, right? > Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there is = only > one other instance where TVA_ENFORCE_SYSFS is not set, which is in the > collapse_pte_mapped_thp() function, but I believe this is reasonable from= its > comments: > > " > /* > * If we are here, we've succeeded in replacing all the native pages > * in the page cache with a single hugepage. If a mm were to fault-in > * this memory (mapped by a suitably aligned VMA), we'd get the hugepage > * and map it by a PMD, regardless of sysfs THP settings. As such, let's > * analogously elide sysfs THP settings here. > */ > if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER)) > " > > Another rule for madvise, referring to David's suggestion: =E2=80=9Callow= ing for > collapsing in a VM without VM_HUGEPAGE in the "madvise" mode would be fin= e". > > To address this issue, the current strategy should be: > > If no hugepage modes are enabled for the desired orders, nor can we enabl= e them > by inheriting from a 'global' enabled setting - then it must be the case = that > all desired orders either specify or inherit 'NEVER' - and we must abort. > > Meanwhile, we should fix the khugepaged selftest for MADV_COLLAPSE by ena= bling > THP. It=E2=80=99s a bit odd that the old test case expects collapsing to succeed even when we=E2=80=99ve set it to =E2=80=98never=E2=80=99. Setting it to =E2=80=98always=E2=80=99 doesn=E2=80=99t seem to test anythin= g as a counterpart. I assume the goal is to test that setting it to =E2=80=98never=E2=80=99 pre= vents collapsing? > > Suggested-by: Lorenzo Stoakes > Signed-off-by: Baolin Wang > --- Thanks Barry