From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A1D3C7115B for ; Tue, 24 Jun 2025 01:45:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E2D36B00B2; Mon, 23 Jun 2025 21:45:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 193DB6B00B4; Mon, 23 Jun 2025 21:45:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D1616B00B5; Mon, 23 Jun 2025 21:45:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id EDA066B00B2 for ; Mon, 23 Jun 2025 21:45:29 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B77531613B2 for ; Tue, 24 Jun 2025 01:45:29 +0000 (UTC) X-FDA: 83588601978.26.F82A614 Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 48FE6A000D for ; Tue, 24 Jun 2025 01:45:26 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=qUR4fTfl; spf=pass (imf15.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.124 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750729527; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RSCsckg9B5BzyioYNc3r3odeKoGUQhJwGtt4Gh8aEqw=; b=bOzGdOthwdHkEYFb1H4ZB+kLT29xU17hd8cLa9xT6iLlQhq9viD90rjW7Qu2d3fLLvfnvF 68IkTfw4/lG0IvL9eiUKpz1UmmfsTa6tN/kvvnHy38FwWvGAaL+8y29x6QMItOrsgyBCN3 rrAzklS15qMAK1bke/cjeobTQsn01A8= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=qUR4fTfl; spf=pass (imf15.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.124 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750729527; a=rsa-sha256; cv=none; b=gqRDsJys1xKtgnTK+fdDocZu6FeLD0Sj4W/+ZpYOXYAkxQiVyWgds0ChpZ1qHICAaMBUtt B5Kpw5U2ZqxvaSsbqk58JCoO2pMuPhbDyj7ydoLMV1/Vrrm+PFOAIs27OXyMip1pseGV5E 3NE9rWUlqeVmyPyWHx9DrWh8aZBePZ8= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1750729524; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=RSCsckg9B5BzyioYNc3r3odeKoGUQhJwGtt4Gh8aEqw=; b=qUR4fTfluTAKJvmD+HJ4wydQXvCVPFg7/7HruC3WkL7sIMWUAK7pHSv3rhu3fYInOsb/wVsPFo4d6FmQ7uHLQKEtOpydzVwckDXeJ658RPzdRtwqKwQLNTePJ+DaFHfSTqdVDwURHD54zdQ0qpfHm5iuFxdupz9F5m6kxDkTsx8= Received: from 30.74.144.102(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WeeLADL_1750729522 cluster:ay36) by smtp.aliyun-inc.com; Tue, 24 Jun 2025 09:45:23 +0800 Message-ID: <1431ef64-ed73-4a47-884f-5a803ce25e28@linux.alibaba.com> Date: Tue, 24 Jun 2025 09:45:22 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled To: Lorenzo Stoakes Cc: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, ziy@nvidia.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <17180060-91a4-4957-a6aa-8e8adaf50ae8@lucifer.local> From: Baolin Wang In-Reply-To: <17180060-91a4-4957-a6aa-8e8adaf50ae8@lucifer.local> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 48FE6A000D X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: xybh9id4m51podfua8a7ndaseybpeweg X-HE-Tag: 1750729526-994554 X-HE-Meta: U2FsdGVkX18NyMcGjCsich+qj6oWVMc1hgHQl5meMnPSWq8oudPNE0LIr2nT+yjhjyHB75ZBvcLbteD+LMTtqn8+zpwCypX/SCROy5WsanH82Oam42QDfoVzi0VHKu3ec9bfn7MZHChu4GOo3Lxd3OPSFd6ggZyUuDZ1lFEPdubwAHhSMJnCpShOWVz8Q5hUtiKK83edPD9GcdmxxLq0/WQhGFrIqExSPC4BEiyfDZH0m2XHPZi5qvz12KzUjFTXWVafPr43haGtoDLd/rmuTLTO1jLfxmWA/2j+nwagBpI7s871t50XhSE1KHewA3IgnWOegnckM/JH4XnMVhGxpPCdFg4+GzUV1KEOv2iftDj74OAxx8DWs6l+68qvrTCV/yEojZ12ZTgJQ4CI25cbZQ1hp//rQ436+zry5qq18ShVzOD/DukU2FpWGLB/Kl2DeEZxRYLYfy85n/T7evOXP6uOVb5qcuJfyEnnLBIcN3OX/u5NLcg5r/lpUFQ6WkGiLeTyGDb0y0lhxyq1KvmpuZegG+dHQKqVMNLq60VsyA+9qEMq+FOkFOWcaAlT8QYAoZw2feVzUUk6VkQ7d/If3SX3EnXDEEfjXFTLkfSAsowzMocgi8aikA4t6u2Z6C0PS6clCSOyXwE/G1umZK9Usc1iD9VT8pp4s348w3Pgl4yB8BI2rF5KA0GUA1fglbcldj7uCEPfhiT/gq8MlhxLbPU4xhIEfa6igQ9xDeZRDcKaS3JDgEtjWo6B1QuemMYIHr2Up5OLr02b9d4So/fH3F76QpYj1bktsd9LhV0j4GDt+juxo2GVM4zfrl/LcnpPH3jtzUzyQUkJOClHVMKBDTHTwHDjjz9kAmzSQRDIYF5Ixk5LJKzI/XiKY4wMdcjbSVv9bxN0iYKFaZKfeiSwQaHGXrU7ipOZB3v3ZnG1XTSPBTR1cLrj/P61jA9UI1gjMqnXVu/zp3/ZBYICZes 2vRubVsA 6b/5vCffJVfOm3qmo+dDi66mbFfRb46z7Np9Jc3aZlwwdQkzp+tlu0/W9YTx9vUG/hOuzbJlkh6TO43VdpAfbd7m8h8d09R9tVBTlQz5oeqMBFwpLELl6fTJYU/G3dwIp+LO73CllpDf/cAHWlRtwWlr0ZB8jqrnzqxixCmgUJjDIxAzWHWJqjH4BoO7VC4vuBqWtfatBrrd7OnKNn5suQERb9MHslYGcgxML/v0Q3lJRim7D6bR/XILjaQ27YLfB+mvmX11FykqSSHRZKLn8b1YAUAvcaFhGGt569IYbxs42L5tyrl4jaNtMRrVRWzSW479bQKyk7Hjm2/lXrty+naWNUbVAQY76Lpgxhs1dbB35J1sjr4+OvYNHHSTzMn4Q/H38FDpJueTkw0RybY/N8UgkVhDK1KecmMwl9788OmosN+0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/6/23 18:26, Lorenzo Stoakes wrote: > On Mon, Jun 23, 2025 at 04:28:08PM +0800, Baolin Wang wrote: >> When invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS flag is not >> specified, we will ignore the THP sysfs settings. Whilst it makes sense for the >> callers who do not specify this flag, it creates a odd and surprising situation >> where a sysadmin specifying 'never' for all THP sizes still observing THP pages >> being allocated and used on the system. >> >> The motivating case for this is MADV_COLLAPSE. The MADV_COLLAPSE will ignore >> the system-wide Anon THP sysfs settings, which means that even though we have >> disabled the Anon THP configuration, MADV_COLLAPSE will still attempt to collapse >> into a Anon THP. This violates the rule we have agreed upon: never means never. >> >> Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there is only >> one other instance where TVA_ENFORCE_SYSFS is not set, which is in the >> collapse_pte_mapped_thp() function, but I believe this is reasonable from its >> comments: >> >> " >> /* >> * If we are here, we've succeeded in replacing all the native pages >> * in the page cache with a single hugepage. If a mm were to fault-in >> * this memory (mapped by a suitably aligned VMA), we'd get the hugepage >> * and map it by a PMD, regardless of sysfs THP settings. As such, let's >> * analogously elide sysfs THP settings here. >> */ >> if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER)) >> " >> >> Another rule for madvise, referring to David's suggestion: “allowing for >> collapsing in a VM without VM_HUGEPAGE in the "madvise" mode would be fine". >> >> To address this issue, the current strategy should be: >> >> If no hugepage modes are enabled for the desired orders, nor can we enable them >> by inheriting from a 'global' enabled setting - then it must be the case that >> all desired orders either specify or inherit 'NEVER' - and we must abort. >> >> Meanwhile, we should fix the khugepaged selftest for MADV_COLLAPSE by enabling >> THP. > > Thanks! Sounds good. >> >> Suggested-by: Lorenzo Stoakes > > Appreciate it though I'm not so bothered about attribution :) but just to say, > of course the 'never' stuff is David's idea (and a good one!) :) Yes, I should also add: Suggested-by: David Hildenbrand >> Signed-off-by: Baolin Wang > > LGTM so: > > Reviewed-by: Lorenzo Stoakes Thanks.