From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4EF4DC021AA for ; Mon, 17 Feb 2025 19:31:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D6ED5280095; Mon, 17 Feb 2025 14:31:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CF7F7280092; Mon, 17 Feb 2025 14:31:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B71E0280095; Mon, 17 Feb 2025 14:31:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9556D280092 for ; Mon, 17 Feb 2025 14:31:29 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 104A1C0C1E for ; Mon, 17 Feb 2025 19:31:29 +0000 (UTC) X-FDA: 83130430698.08.3755211 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf14.hostedemail.com (Postfix) with ESMTP id 7E546100002 for ; Mon, 17 Feb 2025 19:31:26 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TemzZxgq; spf=pass (imf14.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739820686; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NX4hyJJY5LUfgHjXwSqyXtyBfsL13pYnI0Mr0EfX7HY=; b=Q6flBryG7aswq3rm5n9poXirnwjA8QjSdxpX6ow5XtITFWs7a2NfB6uIejKgf9Vrx5vGc1 myJIisn2Vg3QFP31hALyXrvX0I2A5hgQFyYErfdxHL9d5b3LBUGD3N/FsshSnqM3oEbqpe KMaTuqWplEH6I4Dfo/fELzT8HwEGdJc= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TemzZxgq; spf=pass (imf14.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739820686; a=rsa-sha256; cv=none; b=osAUUpEFflsNuGrChpPp4C0EwquS3ujs4oBjH5jDgT5TvzBpd5kvNbCQ5xAxxqy5/q/bYO Nfzl4sULIuHlALNIevqTowsCGnDlaLovXAvWX0iTNofCuJeul4tag5IYhVmRBoEfcX6U4K Ac4G0/ol9f+S59/1JYP5iCEDmiAkQfs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739820685; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NX4hyJJY5LUfgHjXwSqyXtyBfsL13pYnI0Mr0EfX7HY=; b=TemzZxgqrW6jb6JhELhFCWSciSOEbF6u16XptfT7+5W4KhrhH+WHD6L5R723SiTJ9/Hkuq hkCLkrbjouGQtvCu3Z3lpA3Ir5kxtNg/Ij9du0Z4Klk3rAXk++xc7uzQoD7pNDqbV0owJ1 XS/fXwIzVABKJR5/8cu2ysNSuvTQVLc= Received: from mail-yb1-f198.google.com (mail-yb1-f198.google.com [209.85.219.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-517-0EqdO51uPLaSE2_yuzMGaQ-1; Mon, 17 Feb 2025 14:31:22 -0500 X-MC-Unique: 0EqdO51uPLaSE2_yuzMGaQ-1 X-Mimecast-MFC-AGG-ID: 0EqdO51uPLaSE2_yuzMGaQ_1739820682 Received: by mail-yb1-f198.google.com with SMTP id 3f1490d57ef6-e5dd887ae2eso2219679276.2 for ; Mon, 17 Feb 2025 11:31:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739820682; x=1740425482; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NX4hyJJY5LUfgHjXwSqyXtyBfsL13pYnI0Mr0EfX7HY=; b=uOHJGQsON42d3PayKsEXs4V7FQdJPOgEjjgADNppPmatZ7V6MuancmNSoMKM2PjhOM ssHtpQU57+i+GVEBSmpvaPgWAystLFUJ2u+21SLFqPSLDMco93bmfjUVAfjV0+juQWNt OOJHq1LcBPTTDG5c84eyCYORzQNTVdY/gZQqeeXoHO862h0Go8H0p6dEPJ0XkHlS6KTy Am6Q66GzPM4bsOI+WDoLcyoc09kiP46KboqQCnGZaBBtqvWqOF9OBA7+FpWhOIZRDrHQ LJxEcZpT/JYI3o1q4C8FiXE1Ef2ZA4szGnFffjuC6sU1L2ag3EQd9rEmO65i1Ef+O+4N MiNw== X-Forwarded-Encrypted: i=1; AJvYcCX01J/RuNTXTUE77ZRRS/85SnNVLcNMHM7nzIthS4iqa4wmY3XlTziMHM/tV6gvVOxfMh3dTu+sDA==@kvack.org X-Gm-Message-State: AOJu0Ywp70AhjmUVKS0r0YFDwg3ILwIDxEobA7wB70wH8z78mz8AFapr uObucr9Czh0NL2Cou6eyhG4Z7nJ2rwYN/xfqhjj+Br0N8I/EdYM55gTiH1Ei7RQnztOtTTw1gYK OizSu44JxCSefwiT+J35wM2wSEGz1hV12dKIeXqdpcrQTA7B3LjJGlxhirp/Nv/DGZfXOC2fEEb T/bSF0H84+kvm1tgoy4jKqRu0= X-Gm-Gg: ASbGncujsugag9NMLAk5lMaB7qHhvoOzlodv5w0kavtr/+Zxof/QO5H85w3/XRFj/qw v3BCplsufRAWYQAtmPjc014kGCswMmCWRGL01fgUeVb8Fr0n5aXNGirfcgGAJZVqWefx93xEyZm Y= X-Received: by 2002:a05:6902:3486:b0:e5a:e6eb:d44f with SMTP id 3f1490d57ef6-e5dc900e998mr8015137276.6.1739820682102; Mon, 17 Feb 2025 11:31:22 -0800 (PST) X-Google-Smtp-Source: AGHT+IFcDB0O4ITkNRkvRI81tFyLe3JFpMxEjsoT5YAFRkQrsXxl/3FtA6FECS+cW2MojiNPdBeyZA0rbetrrR9AAYw= X-Received: by 2002:a05:6902:3486:b0:e5a:e6eb:d44f with SMTP id 3f1490d57ef6-e5dc900e998mr8015078276.6.1739820681801; Mon, 17 Feb 2025 11:31:21 -0800 (PST) MIME-Version: 1.0 References: <20250211004054.222931-1-npache@redhat.com> <20250211004054.222931-3-npache@redhat.com> In-Reply-To: From: Nico Pache Date: Mon, 17 Feb 2025 12:30:56 -0700 X-Gm-Features: AWEUYZkouadz-BzwdaTEV-k94b6IIL2-QD_uUEVTO5Cso96xfXhx8WXXPjrCpeI Message-ID: Subject: Re: [RFC v2 2/5] mm: document transparent_hugepage=defer usage To: Usama Arif Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, audra@redhat.com, akpm@linux-foundation.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, tiwai@suse.de, baolin.wang@linux.alibaba.com, corbet@lwn.net, shuah@kernel.org X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 03m99qV0EyxCylgJiYSCgxN9IsGP_P2RvQAOrwm2CPE_1739820682 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: qmgphqny8qs8w6ophb4tb5m9t59956im X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 7E546100002 X-HE-Tag: 1739820686-913381 X-HE-Meta: U2FsdGVkX1+YZ3CY5vxHdN/2q4EhpY3+KdFubarEf45JX20vuezfIo48y0bGjLzMP77gDzsAl1NLrllbMWCGskxAKn+S5RF0PGHUvqwIAbzByL+YfjWuPBO5NFjCT9f7tjxvS362l4V3kuJlj3HOKo0qjtT/958a4Wb2MC9JkYEDPi+lreF3m+BJf4BDV9y//61BbYJtlebacfFACK1Gyr1TgKchKjknxXiLbBjlL7RxK6NnFJTlCC5y3lPuN+eclXQ2b77ZcFmH4v/6iPI0pHYEOlh0TMZ+N0BqVZ2BoVTp1m60sNN6wF7JYqt82qZZo1bL+0lkQPjgD2ZSdxF2oTF44aMUkBBmEDsIn0NbX8c76FhfeymGs5xlJv9NfJRkFK7cufE0sxu+wKGqf+ZHNeQxD6h1UfwiDV+JFJPlqHQJOUagLqh+H684W9JrgFlhUuFRS3ZPfp2zK/buLQqleV/yLHCwn9YVuFiY2IQmZJYfq/FydR0Grk7jyLH5Munbm/Pr4jUiY6e3geajADNjVIwV7MLcgnK2L6xVWJ8Y7MSHbzG9V6kduRkCM/HE2ImnzOSdLXQsDYdqpjtzEZb+o4iMHOepNi+S96n+EyGKS2oqpIG0qwOH87AJEg2EbwRmVeZnHr527+I6ovCR++TX1oD0iBDRazoHW91FnA+fVyIQP99RkpSCPW/9Xszm3ZeYXHVBBZ3t7Yu/3ZHbtpjxe/GV5GCZ8brHmBBRKlcG3jxh811/I9vW3NgWAwLmUEUqr+T6eJaLMYsGzO4mLMU+GHB6pGDzP1bBK4dv/QY7uBHPUM0+je4OzXRqKqW7s5FCiJthplWnJhcoZj2BxNYhyVJejrPkjI7v9F/diKAOJ0d5sGHqs4j/DpT1ONAmyMcO09lVNPAc1KsKI89yed4jmbcRa4FOOdt/AIpVUq1dl+NyMCSKKorfzSPpFVJVOvoZdLnmvBcmnbTXUhme39a lTkAZXd4 d0PMUSoHSqPU0ZPVMi8FlmOb61g8jQnc81cyfKjnhWccbX4D3InoL0uC3yQ5O9hun9LhKRKfCE4Xw0j5XoE4nl6KuAc+Mwxoyz4F/OcqyCFDYJ5rRGPh84MZt3KCjoObuSaPTHwUrnojl7y20Ot3IXQp7gr3bDISLSF46+l5wQ2mACDsNmTkTVIeiHE5gNdDX/6Prrf9h+DJweim2xvvF6LSSOjnR93DCxRAlbpUg+GY6DEgUmPGwWd6gC9kvkZ02uAc1HFups6YwukLzpxlkaRqrAjeuRGWf2LtHKakrZ8H5+da86uBNjPmmI/F2UdbZJ5kVLE3j+Htv95v64JFCPBmsVMSzH2TCO1MJKK9aVQVQsGBM8PM1OYpUiPSiBfapd6qs/SE1dbc/5ptozubRWx5heL1sB6EMocBz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 17, 2025 at 8:04=E2=80=AFAM Usama Arif = wrote: > > > > On 11/02/2025 00:40, Nico Pache wrote: > > The new transparent_hugepage=3Ddefer option allows for a more conservat= ive > > approach to THPs. Document its usage in the transhuge admin-guide. > > > > Signed-off-by: Nico Pache > > --- > > Documentation/admin-guide/mm/transhuge.rst | 22 +++++++++++++++++----- > > 1 file changed, 17 insertions(+), 5 deletions(-) > > > > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation= /admin-guide/mm/transhuge.rst > > index dff8d5985f0f..b3b18573bbb4 100644 > > --- a/Documentation/admin-guide/mm/transhuge.rst > > +++ b/Documentation/admin-guide/mm/transhuge.rst > > @@ -88,8 +88,9 @@ In certain cases when hugepages are enabled system wi= de, application > > may end up allocating more memory resources. An application may mmap a > > large region but only touch 1 byte of it, in that case a 2M page might > > be allocated instead of a 4k page for no good. This is why it's > > -possible to disable hugepages system-wide and to only have them inside > > -MADV_HUGEPAGE madvise regions. > > +possible to disable hugepages system-wide, only have them inside > > +MADV_HUGEPAGE madvise regions, or defer them away from the page fault > > +handler to khugepaged. > > > > Embedded systems should enable hugepages only inside madvise regions > > to eliminate any risk of wasting any precious byte of memory and to > > @@ -99,6 +100,15 @@ Applications that gets a lot of benefit from hugepa= ges and that don't > > risk to lose memory by using hugepages, should use > > madvise(MADV_HUGEPAGE) on their critical mmapped regions. > > > > +Applications that would like to benefit from THPs but would still like= a > > +more memory conservative approach can choose 'defer'. This avoids > > +inserting THPs at the page fault handler unless they are MADV_HUGEPAGE= . > > +Khugepaged will then scan the mappings for potential collapses into PM= D > > +sized pages. Admins using this the 'defer' setting should consider > > +tweaking khugepaged/max_ptes_none. The current default of 511 may > > +aggressively collapse your PTEs into PMDs. Lower this value to conserv= e > > +more memory (ie. max_ptes_none=3D64). > > + > > maybe remove the "(ie. max_ptes_none=3D64)", its appearing as a recommend= ation for > the value, but it might not be optimal for different workloads. > > > .. _thp_sysfs: > > > > sysfs > > @@ -136,6 +146,7 @@ The top-level setting (for use with "inherit") can = be set by issuing > > one of the following commands:: > > > > echo always >/sys/kernel/mm/transparent_hugepage/enabled > > + echo defer >/sys/kernel/mm/transparent_hugepage/enabled > > echo madvise >/sys/kernel/mm/transparent_hugepage/enabled > > echo never >/sys/kernel/mm/transparent_hugepage/enabled > > > > @@ -274,7 +285,8 @@ of small pages into one large page:: > > A higher value leads to use additional memory for programs. > > A lower value leads to gain less thp performance. Value of > > max_ptes_none can waste cpu time very little, you can > > -ignore it. > > +ignore it. Consider lowering this value when using > > +``transparent_hugepage=3Ddefer`` > > lowering this value even with thp=3Dalways makes sense, as there might be= cases > when pf might not give a THP, but a VMA becomes eligable to scan via khug= epaged > later? I would remove this line. Perhaps I should be more clear or create a different section for it. The point was that defer was created to prevent internal fragmentation and leave khugepaged to determine when a THP was "useful" (less wasteful). But to achieve this less waste we should also not be using the default. Ideally I would want to change "always" to ignore max_ptes_none (acts as max_ptes_none=3D511), and change the max_ptes_none default to 64 or 128. But that's a separate discussion that I didn't want detracting from these postings. > > > > > ``max_ptes_swap`` specifies how many pages can be brought in from > > swap when collapsing a group of pages into a transparent huge page:: > > @@ -299,8 +311,8 @@ Boot parameters > > > > You can change the sysfs boot time default for the top-level "enabled" > > control by passing the parameter ``transparent_hugepage=3Dalways`` or > > -``transparent_hugepage=3Dmadvise`` or ``transparent_hugepage=3Dnever``= to the > > -kernel command line. > > +``transparent_hugepage=3Dmadvise`` or ``transparent_hugepage=3Ddefer``= or > > +``transparent_hugepage=3Dnever`` to the kernel command line. > > > > Alternatively, each supported anonymous THP size can be controlled by > > passing ``thp_anon=3D[KMG],[KMG]:;[KMG]-[KMG]:``, >