From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23E42C021A9 for ; Mon, 17 Feb 2025 19:41:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 95CAA280085; Mon, 17 Feb 2025 14:41:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 90AD428007D; Mon, 17 Feb 2025 14:41:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7AB8F280085; Mon, 17 Feb 2025 14:41:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5B0DB28007D for ; Mon, 17 Feb 2025 14:41:09 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 012B2A0317 for ; Mon, 17 Feb 2025 19:41:08 +0000 (UTC) X-FDA: 83130455016.24.D90A92B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id B790BC0004 for ; Mon, 17 Feb 2025 19:41:06 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gixUQmaO; spf=pass (imf28.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739821266; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=//bCVclOlkTbOALtHecBUWQzL7Q3Fqbtw0DYb0DfT7o=; b=6oIt2fa8DoJMVbLvRu7m7PJ/AWP/ZIECg+ItNd1c2pZ0pjNE2x4Fc/WC6azZbqllu4pCGe 7x6jmHrulDJC/0mN828O0Kq6+52YAHLkSJqk1+0Lq/6yxBH1CS2Zdw2VG0XJDQm89ZXyUr lipkL44ZNg/S5XygSE5BTCMSu3ON0QE= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gixUQmaO; spf=pass (imf28.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739821266; a=rsa-sha256; cv=none; b=UlePdbTGqLc0BtbcvsJUN3+xzTigVxTErbf58wzXDap8SLbSOJ5NZSA/igTEZA5C4xSYJp Kors27WWiYQ5MnfE7NFk6cqSDpXPCs2EeSSnrDPkq0clDyJI0S5D9adB/41a2skHWQFlOX mUOTgvNHjwCCdHxFArwXGCoD0TGUg5I= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739821266; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=//bCVclOlkTbOALtHecBUWQzL7Q3Fqbtw0DYb0DfT7o=; b=gixUQmaOU+TkvoDNR+rnABK1qeB0d3f/37eNoeVa+vSdyAY0GFdVCHxDMg65HZVRbgqVQC X86202JoKBEihL9Ksupo91dft/ndrppDKsb97dL+pwNYP5ry5oAa/9qJrIWjDZduCz5fW6 pjRAgYhbFzECZQc+Aa9DOIKmatUq9PU= Received: from mail-yb1-f199.google.com (mail-yb1-f199.google.com [209.85.219.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-134-m6bunQtVOJ-hAeghZ_BYoA-1; Mon, 17 Feb 2025 14:41:04 -0500 X-MC-Unique: m6bunQtVOJ-hAeghZ_BYoA-1 X-Mimecast-MFC-AGG-ID: m6bunQtVOJ-hAeghZ_BYoA_1739821264 Received: by mail-yb1-f199.google.com with SMTP id 3f1490d57ef6-e586f6c6289so6023557276.1 for ; Mon, 17 Feb 2025 11:41:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739821264; x=1740426064; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=//bCVclOlkTbOALtHecBUWQzL7Q3Fqbtw0DYb0DfT7o=; b=GAAEOntQTmRgXqUpaX1azU4PiERJE7yymmB0HaNe9PeM+3TS8rMPjaL4uFl1XiOvUH mQgZnGVwHER0P0kV9q43pOquwQLJ6zhiNo+urLqOL2MtdKKmOjCz3p+nJTcxT5ZwiFMr rQnDYHChaL1PJW5qY+E2s+2VpImREYygeInBcoS6/X53WhSZ9gJz7KP821wzRmqOKMYP qmF0PnD7wmbVo7fQt59mXM3/A81JwmtSqrj7i2d6xW4MCqvgtoTd3gkHOfZa8sGh6Grm P28yGasOagXOp3Zv+/Cy8v4BFWf9sDjTsDYhDNZDkkcV1sMkNK5mChc+IeN4YVheFLs3 wUXw== X-Forwarded-Encrypted: i=1; AJvYcCW5EfcOrb2xRnfBOd45VWX7W5E0hMP4gwmW4kBoVbxzfYRXY0OyKyTFPucnNr+vODxXs+nLMtj/vw==@kvack.org X-Gm-Message-State: AOJu0YzkcvGAy5kG12hibLk0NXPg/pB8zEYEQBr3IWLHtvmPC7IIwevB mFF7Nt9xOi7bYKp2DhTi+uBfEIal4R1YubLyr4mRM2xdawCvoiTPC4FOSqAlc6mEtp5PrIqRR7J vdImbpZg+gT3XZKr+m/YNi6s8+8wQ2xF9wTcUtLiIySZLU6/4lfUApGMTDet1kKbKHJt4vRIz39 myTRQOcfJ189MW/HEZ7MIG2Hs= X-Gm-Gg: ASbGnctnKB/Jvk7sfjvH5yR2gvhu7XM/Tv3/6CvTkEDYsfyUnSkVro3BHnUWbx67iQY GmsO0EHRGTCfY7yeifTMz5sErzGZHVnau3oQr3s2p55xaX4sh/gYx7pNn60abbwvaqK1TmoPFjq g= X-Received: by 2002:a05:6902:1245:b0:e57:38e8:e484 with SMTP id 3f1490d57ef6-e5dc906a6efmr8517682276.25.1739821264178; Mon, 17 Feb 2025 11:41:04 -0800 (PST) X-Google-Smtp-Source: AGHT+IEvnPQwnilhWswjs0ZcxjcGnVjg3Ge6f8VbjDNzW12TGEGsZB+XKpaQjNK13699ME9n3+vuSW3RJklFy1lTMg0= X-Received: by 2002:a05:6902:1245:b0:e57:38e8:e484 with SMTP id 3f1490d57ef6-e5dc906a6efmr8517645276.25.1739821263702; Mon, 17 Feb 2025 11:41:03 -0800 (PST) MIME-Version: 1.0 References: <20250211004054.222931-1-npache@redhat.com> <20250211004054.222931-6-npache@redhat.com> <3ef9a5f3-2d63-46db-b0b5-d6f7e78c7888@gmail.com> In-Reply-To: <3ef9a5f3-2d63-46db-b0b5-d6f7e78c7888@gmail.com> From: Nico Pache Date: Mon, 17 Feb 2025 12:40:38 -0700 X-Gm-Features: AWEUYZkcwmtHBdpmUwP7X7Kw49HJAvQ6nyZR8Z4XfpL2-Rs1JAFL0zufMEm8KSA Message-ID: Subject: Re: [RFC v2 5/5] mm: document mTHP defer setting To: Usama Arif Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, sunnanyong@huawei.com, audra@redhat.com, akpm@linux-foundation.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, tiwai@suse.de, baolin.wang@linux.alibaba.com, corbet@lwn.net, shuah@kernel.org X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 7KQkzdas1zt335oKSCrEkKZSiqWu-Z0oYkwh93yQKJM_1739821264 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B790BC0004 X-Stat-Signature: payhipt1syyoy4mzi367xfe9ew6uuzwd X-Rspam-User: X-HE-Tag: 1739821266-732864 X-HE-Meta: U2FsdGVkX19juXJowEUR8ivHvVTEs6EBWIXqjqDa3nTwotSuy3f23w1E29sDBaFqPoaoiOLvg7DuTpMlKrl2Ak1Jsjk44/4wsPuLjgiVFD5cRPCpta47flP1ZS6dlzuA0+ZidzgIcndDcts5+pA3YYa0DFvzES5umVWFtWlxjrn+ZyRVjQvvWSNn+CzsV864+vj8FyiDmzT2FKMi7oDy3qEM7WNUZPLYONw0z1l/1TKto5HUp0U12N1tKpfRaFfiluUF/f26NibjVeupw4dcWYOSMhdrYpUqQ9mJ2BvmmmPPs9kxBEuqTiTiotrWa8GHcsRRI4/UKGzTdf60WEqMF/LDv8jYESM0nbjKyAGORTqX/ERoiI9FxNUJXW6Mjc1vPunrl92yJ/F4cG7una6606fNOUgyqXI4OBvMEj+eTlKKtmlj5AH96DydEJ8GpnxxmbW35jFGZeiX+1pKIjfxSdfQ3tyW1WWZSfnqTcud4++AOA+cIf0Z4WulJrOkgXmnO71/MNnDBML4oom0yN357NdOh1JNa3EaEajYH8ec/X6XSXUYQ/5oFRTCsT++Lg4sTQzuaX3CGsU5JcmQ2jof1AUHOnHzzh4c4J7O1+hxzA7JUzYdhCTtXTBT9NmQBlSXmfR3eGjBwK8xWaSIJgu+ZLc6JK+lF7IzIZbtFsK+by9h0jime+5iQurFBEvSUDLO8+PAXFKaVEZY1+NhPjNwgjbg22E0FspiNwT+HObE6NqX55oLB9CNSv4LD21URIcioJeskirUYhL5uCliYGYi70phJOAO4x4mHpRAwcOW3zeUmKSBulihhOL0RWB4ajS0fp+056IJ1U67FkKHP5xeGL+00w7eC519F0GuSo9Au1ZKCzPNJN07z3qBBTBVO8chYYhr8gIgh77wBe6/4fAZ4zrgcqTubkBV8C+L4HOwAPtOcE4Z0iUZpAi7DIAdYjfpbOnL4hWoVxrArA4D2nK ZzCZtDvq kDNXIEOSSoWaGvjvcVD4Wqjlr0eaB3F7EvD74RlDGHHozqaxCEU5AWXQrFhJv2wsFrdnD8Va5yp3lcp8b4/3zlu43yg/g5QZlM1mmI/EGXFqOu2OtpN75dKfZs6jGp28yOtARYUSnvjVve23fE8aNUI0xC//RJlhrYYFa9bE04Yk/tpDYr3EZ8TQTQz70+cPnFJM8uZcWQkDUD6iEGnE6pbK887fdkeN2n/Yi7rcH7mfwbPySM507TMEOoJ2Nge60dX0kcGYs8pU2fneCtgEPaXE0/n2Ixe5O1C2t3FX4IYG4rllNQ1tnWhuhL3TaIf+jl1h1pVclEjxo9w4Knrn/WJFqmXmhQFi8T2e0MknVW+P0rQkcWOqZmElbIoGxBlimEF4k0bjMZOJlQbc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 17, 2025 at 8:14=E2=80=AFAM Usama Arif = wrote: > > > > On 11/02/2025 00:40, Nico Pache wrote: > > Now that we have mTHP support in khugepaged, lets add it to the > > transhuge admin guide to provide proper guidance. > > > > I think you should move this patch to the mTHP khugepaged series, and jus= t send > THP=3Ddefer separately from mTHP khguepaged. > > > Signed-off-by: Nico Pache > > --- > > Documentation/admin-guide/mm/transhuge.rst | 22 ++++++++++++++++------ > > 1 file changed, 16 insertions(+), 6 deletions(-) > > > > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation= /admin-guide/mm/transhuge.rst > > index b3b18573bbb4..99ba3763c1c4 100644 > > --- a/Documentation/admin-guide/mm/transhuge.rst > > +++ b/Documentation/admin-guide/mm/transhuge.rst > > @@ -63,7 +63,7 @@ often. > > THP can be enabled system wide or restricted to certain tasks or even > > memory ranges inside task's address space. Unless THP is completely > > disabled, there is ``khugepaged`` daemon that scans memory and > > -collapses sequences of basic pages into PMD-sized huge pages. > > +collapses sequences of basic pages into huge pages. > > > > The THP behaviour is controlled via :ref:`sysfs ` > > interface and using madvise(2) and prctl(2) system calls. > > @@ -103,8 +103,8 @@ madvise(MADV_HUGEPAGE) on their critical mmapped re= gions. > > Applications that would like to benefit from THPs but would still like= a > > more memory conservative approach can choose 'defer'. This avoids > > inserting THPs at the page fault handler unless they are MADV_HUGEPAGE= . > > -Khugepaged will then scan the mappings for potential collapses into PM= D > > -sized pages. Admins using this the 'defer' setting should consider > > +Khugepaged will then scan the mappings for potential collapses into (m= )THP > > +pages. Admins using this the 'defer' setting should consider > > tweaking khugepaged/max_ptes_none. The current default of 511 may > > aggressively collapse your PTEs into PMDs. Lower this value to conserv= e > > more memory (ie. max_ptes_none=3D64). > > @@ -119,11 +119,14 @@ Global THP controls > > > > Transparent Hugepage Support for anonymous memory can be entirely disa= bled > > (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE > > -regions (to avoid the risk of consuming more memory resources) or enab= led > > -system wide. This can be achieved per-supported-THP-size with one of:: > > +regions (to avoid the risk of consuming more memory resources), defere= d to > > +khugepaged, or enabled system wide. > > + > > +This can be achieved per-supported-THP-size with one of:: > > > > echo always >/sys/kernel/mm/transparent_hugepage/hugepages-= kB/enabled > > echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-kB/enabled > > + echo defer >/sys/kernel/mm/transparent_hugepage/hugepages-k= B/enabled > > echo never >/sys/kernel/mm/transparent_hugepage/hugepages-k= B/enabled > > > > where is the hugepage size being addressed, the available sizes > > @@ -155,6 +158,13 @@ hugepage sizes have enabled=3D"never". If enabling= multiple hugepage > > sizes, the kernel will select the most appropriate enabled size for a > > given allocation. > > > > +khugepaged use max_ptes_none scaled to the order of the enabled mTHP s= ize to > > +determine collapses. When using mTHPs its recommended to set max_ptes_= none low. > > +Ideally less than HPAGE_PMD_NR / 2 (255 on 4k page size). This will pr= event > > +undesired "creep" behavior that leads to continously collapsing to a l= arger > > +mTHP size. max_ptes_shared and max_ptes_swap have no effect when colla= psing to a > > +mTHP, and mTHP collapse will fail on shared or swapped out pages. > > + > > This paragraph definitely belongs in the khugepaged series, as it doesn't= have anything > to do with THP=3Ddefer. > > re "Ideally less than HPAGE_PMD_NR / 2", > what if you are running on amd, and using 16K and 2M THP=3Dalways only as= , thats where > the most TLB benefit is. Than this recommendation doesnt make sense? That may be correct, I believe the creep requires two adjacent mTHP levels ( ie 512kb, 1024kb) to be enabled for the issue to really present. Although with max_ptes_none=3D511, you will almost always satisfy the collapse request, so your 16Kb mTHPs will be promoted to PMDs. I dont believe 511 is a good default if using mTHPs. > > Also even if you have all mTHP sizes as always, shouldnt you start by col= lapsing to > the largest THP size first? (I haven't reviewed the khugepaged series yet= , so might > be have been discussed there, I will try and review it). We do start at the largest first. The creep happens on a second pass of the PMD, not immediately in the same collapse. > > Did you see the creep behavior you mentioned in your experiments? Yes, I provided an example of how it happens here. https://lore.kernel.org/lkml/CAA1CXcDiGLD=3DdZpFRyAuz4TLrVZZYGp=3Du7=3DZ9Q+= g9RXbf-s2nA@mail.gmail.com/ > > > > It's also possible to limit defrag efforts in the VM to generate > > anonymous hugepages in case they're not immediately free to madvise > > regions or to never try to defrag memory and simply fallback to regula= r > > @@ -318,7 +328,7 @@ Alternatively, each supported anonymous THP size ca= n be controlled by > > passing ``thp_anon=3D[KMG],[KMG]:;[KMG]-[KMG]:``, > > where ```` is the THP size (must be a power of 2 of PAGE_SIZE an= d > > supported anonymous THP) and ```` is one of ``always``, ``madv= ise``, > > -``never`` or ``inherit``. > > +``defer``, ``never`` or ``inherit``. > > > > For example, the following will set 16K, 32K, 64K THP to ``always``, > > set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M >