From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43EF4C3ABDD for ; Tue, 20 May 2025 09:24:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA6826B0092; Tue, 20 May 2025 05:24:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B56CD6B0093; Tue, 20 May 2025 05:24:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A45996B0095; Tue, 20 May 2025 05:24:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7EC736B0092 for ; Tue, 20 May 2025 05:24:56 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 272BF5C02F for ; Tue, 20 May 2025 09:24:56 +0000 (UTC) X-FDA: 83462751792.03.37297DB Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) by imf30.hostedemail.com (Postfix) with ESMTP id 6450D80005 for ; Tue, 20 May 2025 09:24:54 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=kq3W6ei7; spf=pass (imf30.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747733094; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xx//TkYKG/87T6qfujpj4GQczz4RSpoekiQT+JUTLAs=; b=FFz+PcjS9aJ+UrQ/GuK5/P2XLoYqlLx5HDvDeaPVQfAE4WM2aTlqQ+jA0+rbwYjUHoSdWF Rm+6De8/x94LrHXxx4Kfu2MkkK7s/5/o4NaUNt2c2Z3UavBqTDnqXJkENB0AzMv0RA1Sde zCSDwt83uKXr0Wi4fzeHuxdpq4WqANE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747733094; a=rsa-sha256; cv=none; b=ahs85yJdQ5QmNhJLUg9gCEM+QAf+udGp7X8Y8deRi4xxE+55l6PLZF8SEDZufzoCCH+yF3 WsdW2HS8tq4l1T7wzQWRK1lvYPOsPqLa6uSoyBc/0ZTWI920o1bMWsFmMTRvqvpKccvz7J YHBTdg8Bvr2L8tWJNp45VvbdA89VCbo= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=kq3W6ei7; spf=pass (imf30.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-6f8cb6b3340so45299016d6.1 for ; Tue, 20 May 2025 02:24:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747733093; x=1748337893; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=xx//TkYKG/87T6qfujpj4GQczz4RSpoekiQT+JUTLAs=; b=kq3W6ei7Tl+SBVPIlAMW3TEo7i/v9ib8TgFe0SFY6/E+lCXGXFLUwHSooOMxVLKNYm wtVZPuAR29zZPn94Biw60ynqYZMmYgkP5MyLhSFv2DyApcvlxGYyIG6x86nq0NgVFu6M MfcX1ktHIE7Jz/fSV++L3wJSo1jJCxZ5vs+1Wpr3cJetNlvotAdtX2jlsvu0h/pn1PHJ TO8NGS/UBCEwwjdDaYPAMbHOUaB3gT1G6YMl5LsEG2ttsWqp9JnixmDFP7Chqq6QhDNN km07v2VxJf35jtux7LVCCvFRGsZRN4K4kFzMqll9EUo7Zpe8y0oP+1r0x6nQdYfGQDFt SwJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747733093; x=1748337893; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xx//TkYKG/87T6qfujpj4GQczz4RSpoekiQT+JUTLAs=; b=i8411zzlfriZDlmmNqScgFfxyDBOdSxp591BbbfiusNRDTCh6vXbwqMmuCRqkW10Ld TDW3/zslZg4iDE7lJaMtnKARUK1FwJZ+rn/j/pNImCxI1D4OhdALgG9jE9Wqv0EwEO56 M+Jrza1oWMjNbdjZiiaQAqjNkbfANPm0xuV7h08NbrV2rcwa9GFMUZ5f65lmcgF6084e EmX80SIH7reY09DLcfMP41lymDgmZ8xSLjNUz4SRpM85hUEqnMY8qnUuup244K/UVd+9 KBLMpAhanghbK0UQPJP3v2VNK5VkIWyBWI+2Vat9M6Ac9rJJibPlIlzeSGjhiO6mDwnB ZU+g== X-Gm-Message-State: AOJu0YyBqkDug+bICJL/fKe0S8zcqnD391TW7eA6QO2nuwByKoK20zJf RVWyNeGIpujVphuWSqRowU2reQkwnVp60Kl1sU/hboFzApUUJUabVzFYgSgg/qurkaP10LEWB94 gpYaRphgbMWbgWUMDmJr+tmmBR2h1TuE= X-Gm-Gg: ASbGncuNPZkKdqzBE62rrEfRqZICGNDoE9CogHGus5sspdxMmTSdMSLko4xZkmd5fC3 QRAjCnQQbtAmuPxFXIgXiiMuDG7KJOta7id3UXxffPFbCEr2R44T+NHUxDyD2h2pU3IEaNNR5Ol URIMf5RF2CROdMy7xOzFX8xpCuHEaORV6fpQ== X-Google-Smtp-Source: AGHT+IGgparnaxmNvuHM/lrDoY1fvAKD0ORwmu9fSSm7EZPsILX0++SVjr2wKYyuavU6HNxWcE8IYtysoLY8rG5k1Go= X-Received: by 2002:ad4:5aa3:0:b0:6f5:38a2:52dd with SMTP id 6a1803df08f44-6f8b2d12d21mr267363036d6.31.1747733093356; Tue, 20 May 2025 02:24:53 -0700 (PDT) MIME-Version: 1.0 References: <20250515033857.132535-1-npache@redhat.com> In-Reply-To: <20250515033857.132535-1-npache@redhat.com> From: Yafang Shao Date: Tue, 20 May 2025 17:24:16 +0800 X-Gm-Features: AX0GCFuzkEn4ubaObVckXJemP0oCOx73QZ9HTjLQbxewFSpT9IkcJKm3ftBH9v4 Message-ID: Subject: Re: [PATCH v6 0/4] mm: introduce THP deferred setting To: Nico Pache Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, rientjes@google.com, hannes@cmpxchg.org, lorenzo.stoakes@oracle.com, rdunlap@infradead.org, mhocko@suse.com, Liam.Howlett@oracle.com, zokeefe@google.com, surenb@google.com, jglisse@google.com, cl@gentwo.org, jack@suse.cz, dave.hansen@linux.intel.com, will@kernel.org, tiwai@suse.de, catalin.marinas@arm.com, anshuman.khandual@arm.com, dev.jain@arm.com, raquini@redhat.com, aarcange@redhat.com, kirill.shutemov@linux.intel.com, yang@os.amperecomputing.com, thomas.hellstrom@linux.intel.com, vishal.moola@gmail.com, sunnanyong@huawei.com, usamaarif642@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, shuah@kernel.org, peterx@redhat.com, willy@infradead.org, ryan.roberts@arm.com, baolin.wang@linux.alibaba.com, baohua@kernel.org, david@redhat.com, mathieu.desnoyers@efficios.com, mhiramat@kernel.org, rostedt@goodmis.org, corbet@lwn.net, akpm@linux-foundation.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 6450D80005 X-Stat-Signature: jm9bf4ryqaosinhubbnqoooapiq9xfd3 X-Rspam-User: X-HE-Tag: 1747733094-372424 X-HE-Meta: U2FsdGVkX1+DRH30/4F+iPJtkHQPrc1u8LrtuO49HRtgUAiCfwUa0FR0f4t/dNphzBbHVMH/DRZVgvFmz0aBgrJkkxX+JjQuoNFAN6xDSrcRP5tIpBoOAcQ7eLTMbRLWtK4verhFhA/hFgZ55+Rs9sMVTZfy1eiBXmAIZGX0CX5foQAFe4rDKgiXotFEnLh9AqaU10tpAT5USgoDfrUJ2GTtmD16y/FjqABgwXHnUPDdomCtNdXFBaDvW2SHgBuS86J9JaDv3it6VTobQ6DHX7SrwWBjOIN+BV37PNzKDMXZ6vO/If0w/Ff0psJuvC2qEt+cp4kXl945PGFwwKtizKky1SOKxT+L35uLyMmW45F27IB5HrIdiRwHH5sD+Abosr2a8FtlnVgQg0XRGexNZcSZhXGjcQQ/U+K8+ASZK5p4Vc1tCkG+8TXaccXjuvtHVwfxgDwg/9Bf0PjpyA+n97kz3yxE8if9hkcqBN3lmdVPM8+CeigIs6NJW6jAvoOZCtNA/run7m0Z1zCDbVg+FpFyxbbGvOr7PSMBXg4/Gc7xqktsjbcq++AricAHVdV9coKbLxklQTkrjzVjTUDcnXV2NNEJ278HbRHz4zO4otk9E5upgyq3fKhU/r6uen+6mzHKnPWZ4PhTL3qCO7AEkHuQOEvS8+uKBCLQA29Y4bXReM/5IjL/5FQVHzNckKq143GcHcIl2ZPW1uPSOT4pjYIy6VG9fXiAKcagjejTiNaJ+9p2Gkylae5lfujT0q2BFlvVauxJPmDjoTCAlHStlGNfHh3J0e0IFn7UKDj6t8nh6thaNCqIGTeWcsy7H0UAdzlWM55RdWjq+crWI2Z1KehX1CsAlLlUktbAEPw7wpLP/Gh2yQAzlR81N5reJIlfGgmSPJOB2aKfnDKfil6Y+wqtgrrzSkRc93xq4hL02PGt9ALuJeW8jeW+VAUHc+H68uiMs3Ze+ZFVNzMHWLV h1k3bCBQ rpNtSPgZApPOnulA0/aTFv70r7K+pmuetusEtomBy+xs4QP7RIur1Kd2kuBOJ9p9EnjH4UPq4k3KqVbS0Ak+E9hbU24cR7Xx9vMbRb8BRpLtM3+1T7HpEWaZ2lvdlEko/Q+m/co8UkkRseznjJ1OJoNW7DT3rSo7tBoNxWkgsjjVKjfZBQ3ctlCUCX0GLREeYxMEaE9mh17N6+bm//08rTa5GlMe1QcWqOxdrvJX1dBj0vbQy6XP7dZhvUAJ0L8fMSj130w12lwg3bc1Tg+wRAIyzFkuXdalTEvT4VkDdFXzZr5jaDFRKsp3/eF9FW4wrjN+X6lDAOJy4TNc/i64l9yGGCdXjQOCRSgqKH0qCj8CX0i6sJv4yZiZW59w7JWxNkTcnH818/TNJr1LmaouVxkMX4fs+wFes2KDHe2YbV+GrJQ/B4rdm6axMHKoXFMCKbdSmjrkgZj3yfZwL1/Ssqg9cKvxyVQnX6+3q+rfXo33wdmw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 15, 2025 at 11:41=E2=80=AFAM Nico Pache wro= te: > > This series is a follow-up to [1], which adds mTHP support to khugepaged. > mTHP khugepaged support is a "loose" dependency for the sysfs/sysctl > configs to make sense. Without it global=3D"defer" and mTHP=3D"inherit" = case > is "undefined" behavior. > > We've seen cases were customers switching from RHEL7 to RHEL8 see a > significant increase in the memory footprint for the same workloads. > > Through our investigations we found that a large contributing factor to > the increase in RSS was an increase in THP usage. > > For workloads like MySQL, or when using allocators like jemalloc, it is > often recommended to set /transparent_hugepages/enabled=3Dnever. This is > in part due to performance degradations and increased memory waste. > > This series introduces enabled=3Ddefer, this setting acts as a middle > ground between always and madvise. If the mapping is MADV_HUGEPAGE, the > page fault handler will act normally, making a hugepage if possible. If > the allocation is not MADV_HUGEPAGE, then the page fault handler will > default to the base size allocation. The caveat is that khugepaged can > still operate on pages that are not MADV_HUGEPAGE. > > This allows for three things... one, applications specifically designed t= o > use hugepages will get them, and two, applications that don't use > hugepages can still benefit from them without aggressively inserting > THPs at every possible chance. This curbs the memory waste, and defers > the use of hugepages to khugepaged. Khugepaged can then scan the memory > for eligible collapsing. Lastly there is the added benefit for those who > want THPs but experience higher latency PFs. Now you can get base page > performance at the PF handler and Hugepage performance for those mappings > after they collapse. > > Admins may want to lower max_ptes_none, if not, khugepaged may > aggressively collapse single allocations into hugepages. > > TESTING: > - Built for x86_64, aarch64, ppc64le, and s390x > - selftests mm > - In [1] I provided a script [2] that has multiple access patterns > - lots of general use. > - redis testing. This test was my original case for the defer mode. What = I > was able to prove was that THP=3Dalways leads to increased max_latency > cases; hence why it is recommended to disable THPs for redis servers. > However with 'defer' we dont have the max_latency spikes and can still > get the system to utilize THPs. I further tested this with the mTHP > defer setting and found that redis (and probably other jmalloc users) > can utilize THPs via defer (+mTHP defer) without a large latency > penalty and some potential gains. I uploaded some mmtest results > here[3] which compares: > stock+thp=3Dnever > stock+(m)thp=3Dalways > khugepaged-mthp + defer (max_ptes_none=3D64) > > The results show that (m)THPs can cause some throughput regression in > some cases, but also has gains in other cases. The mTHP+defer results > have more gains and less losses over the (m)THP=3Dalways case. > > V6 Changes: > - nits > - rebased dependent series and added review tags > > V5 Changes: > - rebased dependent series > - added reviewed-by tag on 2/4 > > V4 Changes: > - Minor Documentation fixes > - rebased the dependent series [1] onto mm-unstable > commit 0e68b850b1d3 ("vmalloc: use atomic_long_add_return_relaxed()") > > V3 Changes: > - Combined the documentation commits into one, and moved a section to the > khugepaged mthp patchset > > V2 Changes: > - base changes on mTHP khugepaged support > - Fix selftests parsing issue > - add mTHP defer option > - add mTHP defer Documentation > > [1] - https://lore.kernel.org/all/20250515032226.128900-1-npache@redhat.c= om/ > [2] - https://gitlab.com/npache/khugepaged_mthp_test > [3] - https://people.redhat.com/npache/mthp_khugepaged_defer/testoutput2/= output.html > > Nico Pache (4): > mm: defer THP insertion to khugepaged > mm: document (m)THP defer usage > khugepaged: add defer option to mTHP options > selftests: mm: add defer to thp setting parser > > Documentation/admin-guide/mm/transhuge.rst | 31 +++++++--- > include/linux/huge_mm.h | 18 +++++- > mm/huge_memory.c | 69 +++++++++++++++++++--- > mm/khugepaged.c | 8 +-- > tools/testing/selftests/mm/thp_settings.c | 1 + > tools/testing/selftests/mm/thp_settings.h | 1 + > 6 files changed, 106 insertions(+), 22 deletions(-) > > -- > 2.49.0 > > Hello Nico, Upon reviewing the series, it occurred to me that BPF could solve this more cleanly. Adding a 'tva_flags' parameter to the BPF hook would handle this case and future scenarios without requiring new modes. The BPF mode could then serve as a unified solution. --=20 Regards Yafang