From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EDF7C2D0CD for ; Wed, 21 May 2025 10:19:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 83B576B0093; Wed, 21 May 2025 06:19:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7C3936B0095; Wed, 21 May 2025 06:19:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B3B16B0096; Wed, 21 May 2025 06:19:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 498C86B0093 for ; Wed, 21 May 2025 06:19:46 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B844A80E9F for ; Wed, 21 May 2025 10:19:45 +0000 (UTC) X-FDA: 83466518730.30.BAB8DA3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf17.hostedemail.com (Postfix) with ESMTP id 509B84000B for ; Wed, 21 May 2025 10:19:43 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="iUo/f5AY"; spf=pass (imf17.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747822783; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jFbADfTcJ1Mqu56Znoijkofx/Ou7bzVTUZ2Ck/a24Y4=; b=qSkw263v5NaDboVC0SvDwid1zHJSkz1Y69Dbq0nZfGFejKJ9ICxFF7A6XgEQxMnVNfPFin 9AvoFkzNUHPIeHP03mcz/XNM6rxfxxQUsN2+XlBhiiYki8D6T+oweVT12JSEs7xdOM7CvR CNHIvs6quY+26gNw49NQGs0wgK9txeU= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="iUo/f5AY"; spf=pass (imf17.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747822783; a=rsa-sha256; cv=none; b=fp+k1n+VQN9V0DOd0IhriN09Fj7S4Tb9aZ5woXl7DuHTmwGu1NHVcr09+nTAvAGtkAdYeA IOS3cnGppoNRRencfv8LpeGdZuBVXTHiaio2xv9s5H9WsowzsXO6E5PAnM0HI1Sp9E8AfW vap19xob/plln7gg6JpRN6/9GBPxu14= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1747822782; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jFbADfTcJ1Mqu56Znoijkofx/Ou7bzVTUZ2Ck/a24Y4=; b=iUo/f5AYt1ggh1ALAoy8A4iG423eUtwBx/8SWT8EOysRH3UI/gMcXRF6JxhtNYP5/LTupc xStlUTmd5Q+AC99ycEMZWBRU+xUjSxDaOa/18COxDDT8HTjqa+FWUoBkbMULtYscMT21+n AlFGsm5yz6eRGD4N/70vw4AUaf5jb4I= Received: from mail-yw1-f198.google.com (mail-yw1-f198.google.com [209.85.128.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-315-uuIylxrpNAGiVeUcIxtiUw-1; Wed, 21 May 2025 06:19:41 -0400 X-MC-Unique: uuIylxrpNAGiVeUcIxtiUw-1 X-Mimecast-MFC-AGG-ID: uuIylxrpNAGiVeUcIxtiUw_1747822781 Received: by mail-yw1-f198.google.com with SMTP id 00721157ae682-708aca58513so90130347b3.0 for ; Wed, 21 May 2025 03:19:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747822781; x=1748427581; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jFbADfTcJ1Mqu56Znoijkofx/Ou7bzVTUZ2Ck/a24Y4=; b=ex7cQrq//AyJqJJFswL4hwnTV+BtTt57hBJhcOMpyKT8YoMTREqv6E7rkSx/xNmjyo 52spruil4yGi0sKQEUtP5lWZeLzXsks5lkQ5zujFFiEsWOnip9nl/OYmR3tSjRtwDle+ 5s46nnwhpjdbIexZRTGC8e64zz8RUTLrn7T21yJgLZoULqmweONqM4tVvkMq04wKhRtA dqoGhiwT2qTAUURMTGpIHNnQV7kXz+OyysYc8Vgwgvo2NrsZdRSY8e1rxraBKBW0wRYD Ezi8Rr+CrZ82uBD6+BfePFEkW06HggyZPPyE2VdGY5cUJsCKeQCeN+DkjbdktaqCn9oj 0Yow== X-Gm-Message-State: AOJu0YyTFUDfFNg4v3VY4tBmeZez3EXdFF+kBTFlzmgFfx1UXSvJafL0 PmGLamVW/Xjat0n83o1xElRrQOBrKu7KrFrcq7yZzP96dvKwrwleH7Hd4jNerVWQIv4nibJuO3p 7eZWirItvZfZJPWfm4ER7Avgv/j1zsIH/4xyouJR4inkjCmoZZweAwaQcVK3q/9SwuokuqWJafL BXbNQ/H2yTih0R0glFYRrnwH/FH0o= X-Gm-Gg: ASbGnct17NgzXm+162vDAIXFXWIOqMIVX9+Fnu46bJBlt5D9xL4BgPu/dhCTIHaK4ZZ F5ZucB968YjBHFYIb3abJV3hN/LQcdv/J44YmQFsSc5eq8gVmSF4s3GhhY/B+KsKng1CbMLI= X-Received: by 2002:a05:690c:a04e:b0:706:cc6b:855e with SMTP id 00721157ae682-70ca7b8764dmr240922027b3.30.1747822780976; Wed, 21 May 2025 03:19:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGKhbiolXpBTRSOEIXOuEEKTBfMQMLWXYpKNf3Zd9a7dND1eGBOvZeq9FrcbaSbw9Tu7iUlktcgRTOwZw4Z3ww= X-Received: by 2002:a05:690c:a04e:b0:706:cc6b:855e with SMTP id 00721157ae682-70ca7b8764dmr240921757b3.30.1747822780529; Wed, 21 May 2025 03:19:40 -0700 (PDT) MIME-Version: 1.0 References: <20250515033857.132535-1-npache@redhat.com> In-Reply-To: From: Nico Pache Date: Wed, 21 May 2025 04:19:14 -0600 X-Gm-Features: AX0GCFuidlomjiU6Y6zHYcAbv39I0OnROGONj5SNbJ9Knl6lj4JOxI3ty2Mr2c8 Message-ID: Subject: Re: [PATCH v6 0/4] mm: introduce THP deferred setting To: Yafang Shao Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, rientjes@google.com, hannes@cmpxchg.org, lorenzo.stoakes@oracle.com, rdunlap@infradead.org, mhocko@suse.com, Liam.Howlett@oracle.com, zokeefe@google.com, surenb@google.com, jglisse@google.com, cl@gentwo.org, jack@suse.cz, dave.hansen@linux.intel.com, will@kernel.org, tiwai@suse.de, catalin.marinas@arm.com, anshuman.khandual@arm.com, dev.jain@arm.com, raquini@redhat.com, aarcange@redhat.com, kirill.shutemov@linux.intel.com, yang@os.amperecomputing.com, thomas.hellstrom@linux.intel.com, vishal.moola@gmail.com, sunnanyong@huawei.com, usamaarif642@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, shuah@kernel.org, peterx@redhat.com, willy@infradead.org, ryan.roberts@arm.com, baolin.wang@linux.alibaba.com, baohua@kernel.org, david@redhat.com, mathieu.desnoyers@efficios.com, mhiramat@kernel.org, rostedt@goodmis.org, corbet@lwn.net, akpm@linux-foundation.org X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: dSClC6O80-8lCqX3-qa6D6HTFSNwB5bM1qXiIvDlxuQ_1747822781 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 509B84000B X-Rspamd-Server: rspam09 X-Stat-Signature: 56e6tw8nm6ykffniusck467eidc1ec54 X-HE-Tag: 1747822783-916786 X-HE-Meta: U2FsdGVkX1+B+A6/Jhd10cyG8oxrTyclasb/0kicjYIvSgFvCGQtz5XYKpmZ3EMAUKoL1IVqVvH+a8P7w1UIbVxDkI5xFpMCY9H51G47IsoBPW+ZJDLEHQG6Qm7VI2TEjYraodirHQ3XDOEBVT9N3YMxTOPsf/OZ9PdaHVRwy0LInagJWthWhQOUUZ8pwaM0fjes4KHp6ZyOIsZDQkFfESToal6AYp0+tqpVO2+kxg6pOOSg+nB7zY33M0i1+INJN6EIbZTdDjV+7mcF8P0H1WWZ9z/bcIFRfoLP7dn0q1KGkz1Cb9OGzHd3g/H3UinEBotKb6FtQ49fxu0lff6fT/who6tv9UvyqOFYA9oDSpGv6Ck2kUpA2rvsna7iU7edvFt26BsK0bpkkU3JxQdrISBZ9TfmTka8KkexUkTALnTAz+xuuifDKV843VDJEHJnbdttNgv0xJtRI68UvxAjes2kJP0l15mOPgqLfKVWqHbmIoCbDJjGDpSy6A23j2qrDVISVUYKE48IKBgM58aBlaS553rlpMNdRZsepKtx8aNmgKCGGrHAQkfdRNz6ifgYKtFbw3o/fr55WtWHT2xiLgah7qP1+7GWiQwx6xweX9yyJpAMRLEzGGYMQ1VD+KkjgSACCfByqefL8idagBonbuXpg19lVFChIk2DBItOQtyR9yaW9BWug5Dwu5bpeoWURCenV5na3teE4ya2Jpw3B+hEc+HLMaY+1wW8RVzxka+RGvK+JFgypG6PSeebSytYPuReVBRsi1BZRpSnoO8t5CqaO7KRm0WfDJHL7ieL1wzyqeNXKUO4+6hQsPSKtTXcaDWkrO8Qw3AZzhCD8Ps4Jxu/NzatjmYdPn57soRkBAySaw3+7pFk8rjYXuOB25HfHr8eCHmDYyDp/Ny+VTMdWTuD3IEjA5sK4t9zsvnbWtIv2BLo7nSIE+7w9auFxdFjIO23TopfMZienhcbJ8k bdapqXKi 4Ln6ILZiQ9k8zrtVGJSzF0s4WI2cr5FuAcJz7jmpwWdQSZUgwimel542Vx+oOjQHsod/1DSSVOcq0RtbfIOVKHUvnDSYNcecd7ZORlx4fr095N5yt8PzvEeSQe75hQb1Vx+YNHPrBBLJkccn+jqwGG6e4K/DlVY2u7AnxqrsHAbfsPF2CFIoxnKYEt+c39SJ73Joa1dxWE0560PAbR3L0wbBsEsLzV8qHSMo2VGWyFYfEFFU0hxFzDmYW9dqHwtcMPKfn6OPrTZZkzdAdd76dbV/z9piDnsOYgfLuWdlagaE/VKqYAxiJlGo3BeM3mwG2jxEYGqgOM0R7Ve0AjYyn+24XiLMx20nLhJt4IrfwkFpr+1hq7ZTzehuJ4SNNmOBCMnxgKOL3bfiwGWLWYZ/rlFWkj/rek6uDg+vDZhE9rznOFVUG4G6qxWwl/Upm/JEw+BVe X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 20, 2025 at 3:25=E2=80=AFAM Yafang Shao = wrote: > > On Thu, May 15, 2025 at 11:41=E2=80=AFAM Nico Pache w= rote: > > > > This series is a follow-up to [1], which adds mTHP support to khugepage= d. > > mTHP khugepaged support is a "loose" dependency for the sysfs/sysctl > > configs to make sense. Without it global=3D"defer" and mTHP=3D"inherit= " case > > is "undefined" behavior. > > > > We've seen cases were customers switching from RHEL7 to RHEL8 see a > > significant increase in the memory footprint for the same workloads. > > > > Through our investigations we found that a large contributing factor to > > the increase in RSS was an increase in THP usage. > > > > For workloads like MySQL, or when using allocators like jemalloc, it is > > often recommended to set /transparent_hugepages/enabled=3Dnever. This i= s > > in part due to performance degradations and increased memory waste. > > > > This series introduces enabled=3Ddefer, this setting acts as a middle > > ground between always and madvise. If the mapping is MADV_HUGEPAGE, the > > page fault handler will act normally, making a hugepage if possible. If > > the allocation is not MADV_HUGEPAGE, then the page fault handler will > > default to the base size allocation. The caveat is that khugepaged can > > still operate on pages that are not MADV_HUGEPAGE. > > > > This allows for three things... one, applications specifically designed= to > > use hugepages will get them, and two, applications that don't use > > hugepages can still benefit from them without aggressively inserting > > THPs at every possible chance. This curbs the memory waste, and defers > > the use of hugepages to khugepaged. Khugepaged can then scan the memory > > for eligible collapsing. Lastly there is the added benefit for those wh= o > > want THPs but experience higher latency PFs. Now you can get base page > > performance at the PF handler and Hugepage performance for those mappin= gs > > after they collapse. > > > > Admins may want to lower max_ptes_none, if not, khugepaged may > > aggressively collapse single allocations into hugepages. > > > > TESTING: > > - Built for x86_64, aarch64, ppc64le, and s390x > > - selftests mm > > - In [1] I provided a script [2] that has multiple access patterns > > - lots of general use. > > - redis testing. This test was my original case for the defer mode. Wha= t I > > was able to prove was that THP=3Dalways leads to increased max_laten= cy > > cases; hence why it is recommended to disable THPs for redis servers= . > > However with 'defer' we dont have the max_latency spikes and can sti= ll > > get the system to utilize THPs. I further tested this with the mTHP > > defer setting and found that redis (and probably other jmalloc users= ) > > can utilize THPs via defer (+mTHP defer) without a large latency > > penalty and some potential gains. I uploaded some mmtest results > > here[3] which compares: > > stock+thp=3Dnever > > stock+(m)thp=3Dalways > > khugepaged-mthp + defer (max_ptes_none=3D64) > > > > The results show that (m)THPs can cause some throughput regression in > > some cases, but also has gains in other cases. The mTHP+defer results > > have more gains and less losses over the (m)THP=3Dalways case. > > > > V6 Changes: > > - nits > > - rebased dependent series and added review tags > > > > V5 Changes: > > - rebased dependent series > > - added reviewed-by tag on 2/4 > > > > V4 Changes: > > - Minor Documentation fixes > > - rebased the dependent series [1] onto mm-unstable > > commit 0e68b850b1d3 ("vmalloc: use atomic_long_add_return_relaxed()= ") > > > > V3 Changes: > > - Combined the documentation commits into one, and moved a section to t= he > > khugepaged mthp patchset > > > > V2 Changes: > > - base changes on mTHP khugepaged support > > - Fix selftests parsing issue > > - add mTHP defer option > > - add mTHP defer Documentation > > > > [1] - https://lore.kernel.org/all/20250515032226.128900-1-npache@redhat= .com/ > > [2] - https://gitlab.com/npache/khugepaged_mthp_test > > [3] - https://people.redhat.com/npache/mthp_khugepaged_defer/testoutput= 2/output.html > > > > Nico Pache (4): > > mm: defer THP insertion to khugepaged > > mm: document (m)THP defer usage > > khugepaged: add defer option to mTHP options > > selftests: mm: add defer to thp setting parser > > > > Documentation/admin-guide/mm/transhuge.rst | 31 +++++++--- > > include/linux/huge_mm.h | 18 +++++- > > mm/huge_memory.c | 69 +++++++++++++++++++--- > > mm/khugepaged.c | 8 +-- > > tools/testing/selftests/mm/thp_settings.c | 1 + > > tools/testing/selftests/mm/thp_settings.h | 1 + > > 6 files changed, 106 insertions(+), 22 deletions(-) > > > > -- > > 2.49.0 > > > > > > Hello Nico, > > Upon reviewing the series, it occurred to me that BPF could solve this > more cleanly. Adding a 'tva_flags' parameter to the BPF hook would > handle this case and future scenarios without requiring new modes. The > BPF mode could then serve as a unified solution. Hi Yafang, I dont see how this is the case? This would require users to modify/add functionality rather than configuring the system in this manner. What if BPF is not configured or being used? Having to use an additional technology that requires precise configuration doesn't seem cleaner. Either way, thank you for taking a look into the series ! -- Nico > > -- > Regards > Yafang >