From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99285C369B2 for ; Mon, 14 Apr 2025 22:35:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AFA1A2800AA; Mon, 14 Apr 2025 18:35:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AA62D2800A7; Mon, 14 Apr 2025 18:35:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 920A62800AA; Mon, 14 Apr 2025 18:35:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6CB552800A7 for ; Mon, 14 Apr 2025 18:35:38 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 83FA15ADC4 for ; Mon, 14 Apr 2025 22:35:38 +0000 (UTC) X-FDA: 83334107556.23.E85F5DB Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf07.hostedemail.com (Postfix) with ESMTP id F29D040003 for ; Mon, 14 Apr 2025 22:35:34 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=M7WwdZvx; dmarc=none; spf=none (imf07.hostedemail.com: domain of rdunlap@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=rdunlap@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744670135; a=rsa-sha256; cv=none; b=Xf80bmmjRdkdfLVzuYIEPWUNOD8ObFuK396xQskUSONi5rWTyLVUxEsqoz2SsEvmB4H3pu t3/3NmenxboC3aR1CsHo1J24wLUEXxOpAWed138Sk4IsFexym/QLCLKm2aiw+1Ud9Z34En Q2D2oj/6htIH6SQBhTCirZo9Q9g0YkY= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=M7WwdZvx; dmarc=none; spf=none (imf07.hostedemail.com: domain of rdunlap@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=rdunlap@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744670135; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BD09sozXk7mDc/vLq4jmW/slDmud21XMtjmESSd0XDI=; b=kZKhZbIye+WB6w7vmzSeIfDovaBjzMjjv5vnpxIMa7Dckjo/ZdrvAIdWWhKM//lqlKG3DE S3XkrUb9EMrpGFbU289vbYITAJSZTC+WQ7gd50jZV5xizFSKg/Pe856E2euwh1fYCX26yn WsJ+HA1yJP1Zdjq9S1Z0ieExnJTL5vU= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date:Message-ID:Sender :Reply-To:Content-ID:Content-Description; bh=BD09sozXk7mDc/vLq4jmW/slDmud21XMtjmESSd0XDI=; b=M7WwdZvxYUIKye09kI1jeq2R3I 2qacZI5ou3EO0VHyC3ym7veCiHt4juqgxHKNQ/UwscJ5IKglZ0kuOnPEv8TI/ka3Y/N+/IbjBLzaC 9UObCzeCzOvc3HxV6m6q3BFTaduY5wgolfHn1OI5BH29hRaNqaYX/RnpYwBFG2OTpLkV8vbZNIivj bgwruO3/5PjQ5x8yKrUpj/GkLqttS5MzPCOdFb+8kBANV5TEMjmV1trYQdVHMrdofdNzPK1+wHQVI 2ZM1tWh3T0TXhXEN9P9AtXlNr43tNqII+loTPSUdJeBuJA2us3Z+FgyUvsNv6jNL0gqP2LdiY9wgc kkkLn69Q==; Received: from [50.39.124.201] (helo=[192.168.254.17]) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1u4SNU-00000002mNv-3QOZ; Mon, 14 Apr 2025 22:34:10 +0000 Message-ID: <8fbdd651-c965-4fa7-a715-e01092f5de7b@infradead.org> Date: Mon, 14 Apr 2025 15:33:38 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 2/4] mm: document (m)THP defer usage To: Nico Pache , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, shuah@kernel.org, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ioworker0@gmail.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, dev.jain@arm.com, mhocko@suse.com, rientjes@google.com, hannes@cmpxchg.org, zokeefe@google.com, surenb@google.com, jglisse@google.com, cl@gentwo.org, jack@suse.cz, dave.hansen@linux.intel.com, will@kernel.org, tiwai@suse.de, catalin.marinas@arm.com, anshuman.khandual@arm.com, raquini@redhat.com, aarcange@redhat.com, kirill.shutemov@linux.intel.com, yang@os.amperecomputing.com, thomas.hellstrom@linux.intel.com, vishal.moola@gmail.com, sunnanyong@huawei.com, usamaarif642@gmail.com, mathieu.desnoyers@efficios.com, mhiramat@kernel.org, rostedt@goodmis.org References: <20250414222456.43212-1-npache@redhat.com> <20250414222456.43212-3-npache@redhat.com> Content-Language: en-US From: Randy Dunlap In-Reply-To: <20250414222456.43212-3-npache@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: F29D040003 X-Stat-Signature: e7gsbx5cd1fg1d1yfud6umjat35foj6q X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1744670134-971431 X-HE-Meta: U2FsdGVkX18m5JcyZXySFfRM9McKnsimEVRbXpFqhInQjp+9cqmdM6O+qgI6jgz2lewfNsEqqnZKEaOsTjKyWItAngVPNPs7rTA33U4k/+dBNpd2yGU+qxPUHBnJg+rqx4XWWNhwGU3hgkleIrLi6FNN9XPII7TQ/aviEqciNuPnqzCNGPjDGMFqphuDD9F9Lbin6tFElj18Nqd/Z3E+B6K/Nza4gBlSFyj6yWTKJ/YY65cP6rj7HpTohebPPXb3jXZRiMl4QKmxtJ0rtaRZHrBt+fznH7BQbUHmNF1fIxi/S2P7/02TyV8utzOcrP2BFSyFT+ooq5k2GUY7sJr9TmYkvzCmvdhfXjAgHSFM8TsNSC+/a0JAYr5oKN8y6hog6yizmAfXeLJJ44strCCFYPSXvlNqCtApwmawC/zOC6wEZAmKqN7UXAzjhs3yOy8iHdppOx0QbY2Ljd2Q3LzVH/5e/yZCipS5sgLkutA1jxAa40VpjoQrPM91KI07flOnxyDXXwowXfXfgD83USp5vwQQxrajGuq1xvNjBabMLGEs1VJonknw9txNmlqk1kWPhZn5loh3F/CfAL0GIpWTmYNEhunfOcpf9jGil/LfkX2GdBsEHGfaH5TOAOxetHZkcAKhQVV4mwS6wjmOUWhf/OCh3HsjgX2+A4Ghsz74F1weMy3+JZVK4uqC6U5L1ytHO1Uw0O2GVvl3Xj9r9Bw3Xd38zRWr8OZ0suk7t4ORxyzC/2+8AYg4oeTVXvGSiAv946T9GCqoaR7K1ZgL70IFyPhFJjgK6YSBxtISsPYUbCt2AqZ6JkvYa3fZnreXDk+873UbUyWfdGdF1dngxDTC0ydLnmF/yi3OnqbIhS6MIO/XQ715rHEKT6BoRI95KrxjMgdT1VzE7ZVOiPBQtnG59RSsxWQ+dpUUeo5yGnxqSPcx6t73CZJ7dBU2xuBNyka2+lpE/6xKtzv09bqFiog BKPLB4Kn JAYMZlsOPpm0qQJi/D5y5w/zmGI2MG492GNJCwwJatCENhunBwN/0XjJkK9aX7g+EvUiVh9AUuCpm8OwMzS5klLruAciJ9UDuWStG7NwH3253T0ulxNHlzzldzLNTKKEapCXcriFMJgChYuuMDHnTofA1e8puR8wqCcC5+Wj5BefSsUT13WTq5CihxtY9SEWHEYhBt0v+IZevE3xa9AoWo6ZeVewj3PI6PWEp+Cw2K0phDVc9gpxoUBHYOmJAMPsaHnLyYwn0eZ4EUH/d+c6P30cIwY1NInzckckMen04XnPK97OgHHJXGUZFp2b+lfe+nUg5HhREjcAHro4mzj9c/gYBXuJOVlPVD+kE99f18/eKi5xXuh+CadL74Il47fCezAQDf8yj5oACcGwTSdf/YvXqOw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/14/25 3:24 PM, Nico Pache wrote: > The new defer option for (m)THPs allows for a more conservative > approach to (m)THPs. Document its usage in the transhuge admin-guide. > > Signed-off-by: Nico Pache > --- > Documentation/admin-guide/mm/transhuge.rst | 31 ++++++++++++++++------ > 1 file changed, 23 insertions(+), 8 deletions(-) > > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > index f0d4e78cedaa..d3f072bdd932 100644 > --- a/Documentation/admin-guide/mm/transhuge.rst > +++ b/Documentation/admin-guide/mm/transhuge.rst > @@ -88,8 +88,9 @@ In certain cases when hugepages are enabled system wide, application > may end up allocating more memory resources. An application may mmap a > large region but only touch 1 byte of it, in that case a 2M page might > be allocated instead of a 4k page for no good. This is why it's > -possible to disable hugepages system-wide and to only have them inside > -MADV_HUGEPAGE madvise regions. > +possible to disable hugepages system-wide, only have them inside > +MADV_HUGEPAGE madvise regions, or defer them away from the page fault > +handler to khugepaged. > > Embedded systems should enable hugepages only inside madvise regions > to eliminate any risk of wasting any precious byte of memory and to > @@ -99,6 +100,15 @@ Applications that gets a lot of benefit from hugepages and that don't > risk to lose memory by using hugepages, should use > madvise(MADV_HUGEPAGE) on their critical mmapped regions. > > +Applications that would like to benefit from THPs but would still like a > +more memory conservative approach can choose 'defer'. This avoids > +inserting THPs at the page fault handler unless they are MADV_HUGEPAGE. > +Khugepaged will then scan the mappings for potential collapses into (m)THP > +pages. Admins using this the 'defer' setting should consider > +tweaking khugepaged/max_ptes_none. The current default of 511 may > +aggressively collapse your PTEs into PMDs. Lower this value to conserve > +more memory (ie. max_ptes_none=64). i.e., > + > .. _thp_sysfs: > > sysfs > @@ -109,11 +119,14 @@ Global THP controls > > Transparent Hugepage Support for anonymous memory can be entirely disabled > (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE > -regions (to avoid the risk of consuming more memory resources) or enabled > -system wide. This can be achieved per-supported-THP-size with one of:: > +regions (to avoid the risk of consuming more memory resources), defered to deferred > +khugepaged, or enabled system wide. > + > +This can be achieved per-supported-THP-size with one of:: > > echo always >/sys/kernel/mm/transparent_hugepage/hugepages-kB/enabled > echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-kB/enabled > + echo defer >/sys/kernel/mm/transparent_hugepage/hugepages-kB/enabled > echo never >/sys/kernel/mm/transparent_hugepage/hugepages-kB/enabled > > where is the hugepage size being addressed, the available sizes > @@ -136,6 +149,7 @@ The top-level setting (for use with "inherit") can be set by issuing > one of the following commands:: > > echo always >/sys/kernel/mm/transparent_hugepage/enabled > + echo defer >/sys/kernel/mm/transparent_hugepage/enabled > echo madvise >/sys/kernel/mm/transparent_hugepage/enabled > echo never >/sys/kernel/mm/transparent_hugepage/enabled > > @@ -281,7 +295,8 @@ of small pages into one large page:: > A higher value leads to use additional memory for programs. > A lower value leads to gain less thp performance. Value of > max_ptes_none can waste cpu time very little, you can > -ignore it. > +ignore it. Consider lowering this value when using > +``transparent_hugepage=defer`` > > ``max_ptes_swap`` specifies how many pages can be brought in from > swap when collapsing a group of pages into a transparent huge page:: > @@ -306,14 +321,14 @@ Boot parameters > > You can change the sysfs boot time default for the top-level "enabled" > control by passing the parameter ``transparent_hugepage=always`` or > -``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the > -kernel command line. > +``transparent_hugepage=madvise`` or ``transparent_hugepage=defer`` or > +``transparent_hugepage=never`` to the kernel command line. > > Alternatively, each supported anonymous THP size can be controlled by > passing ``thp_anon=[KMG],[KMG]:;[KMG]-[KMG]:``, > where ```` is the THP size (must be a power of 2 of PAGE_SIZE and > supported anonymous THP) and ```` is one of ``always``, ``madvise``, > -``never`` or ``inherit``. > +``defer``, ``never`` or ``inherit``. > > For example, the following will set 16K, 32K, 64K THP to ``always``, > set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M -- ~Randy