From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AC64C369D5 for ; Mon, 28 Apr 2025 18:29:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C630D6B00BB; Mon, 28 Apr 2025 14:29:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BEA2A6B00BC; Mon, 28 Apr 2025 14:29:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8A9E6B00BD; Mon, 28 Apr 2025 14:29:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 86A246B00BB for ; Mon, 28 Apr 2025 14:29:30 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0C86A1210E6 for ; Mon, 28 Apr 2025 18:29:32 +0000 (UTC) X-FDA: 83384290584.07.E2EF452 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 3B937140011 for ; Mon, 28 Apr 2025 18:29:30 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IHQuDnbd; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf09.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745864970; a=rsa-sha256; cv=none; b=Z7+eVL8jpEQapFyxkuESfHioeZandR9OHWOksAq3dLC55uicTfqSQ5lnxa7TnQZQvSCJC0 oBVzjJMXQdr3MlXLw7JToRrWl+u81yNqk/kRCBxqaJNFVaY3y5Rp5cWfwUuo/zLRljZvfL MyJnxJQxcInwmdNGmTL8DmQdwUBu3i4= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IHQuDnbd; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf09.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745864970; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=vn6qN5R0W1vyV94Pxo62dc7PUSUH9kpksvjrmkO531Q=; b=jtxDhn8aQTQpI8pTxesdKpjxUbjD4aFbZQVKf8gvMitKcySx3R8nfo7uATr3tJF9FyEzFu aDEZKaWeL/9gjQo1siJbByTJFJE5ifuLQ+ItfgJ9VhEcIljK/jes7C85AEnshy59SxG6Qt AfJiY0A/ucRF9alHwPi1nz5QvHDIH6A= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745864969; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=vn6qN5R0W1vyV94Pxo62dc7PUSUH9kpksvjrmkO531Q=; b=IHQuDnbdlL8pNreXcZ7emE1ag5jl5g9OoYJGiNSj6mm0gokQjuXIRzlXm/Nak9ry4V58+C j3syju8pAkceWs26W8GzKlnc6qYvNIwPxfh1vlhOeGev2cyAfB+uGCAXDGnYhMzsR5LlNN GnCso1AJ0EhR/fglu28GJ0uENcBkV2I= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-221-gKDvzwrDOpeQZ_TyPCDpuQ-1; Mon, 28 Apr 2025 14:29:25 -0400 X-MC-Unique: gKDvzwrDOpeQZ_TyPCDpuQ-1 X-Mimecast-MFC-AGG-ID: gKDvzwrDOpeQZ_TyPCDpuQ_1745864961 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 708BE180036E; Mon, 28 Apr 2025 18:29:20 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.65.12]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D2B9830001A2; Mon, 28 Apr 2025 18:29:10 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, shuah@kernel.org, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v5 0/4] mm: introduce THP deferred setting Date: Mon, 28 Apr 2025 12:29:00 -0600 Message-ID: <20250428182904.93989-1-npache@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 3B937140011 X-Stat-Signature: zt6wbmeqfnizf9ioqczdjz9dsarcc1ic X-HE-Tag: 1745864970-123537 X-HE-Meta: U2FsdGVkX1+TZFnqlfE/GfgOkNRziJFcwsas2dPAvSRPMH+YrrPT/0R7wYQ5V5C1kqPROEquQNssrLrsjGExt4Nl2tEptPxIxSZCbWb5s7V/Fba3UK7BtOPQ3Rr/wjmxsWRnW9WZLjo6OJkvDY8x76IQSrb5mkxDjZq40f6YkuyEsO/MUg8Lmjpv5EX1cGkNup+vpa4C/4c6eE7233ZcIA7Ppjg8urXHNq1Irhh2ET41jI0o6E3tRrVNXzrFB8SF+RTJ6a+/SCj5Y8Y5mZjfRZ/KWanZ+JCYkPSTkqinUQKXye2mZV+SS7y1aeNw3il9CYkeZgVeSRQwcu/ipo8y4pkSRq9DCSiciRdKfVtdsv8FoYwOCN5vC2EVXbWGqdKtm4j59pK8zxQuqMh/x569g8w1ENll9RHtkSN7aPfsn42+nF41srD9QJhekIhOiwUA4e3WPNzfX9EefTw+vyUnJ+onMQTVceS62drxt3nmaRE8yagOPy/gBBikd+dO/G3gqsJ33PXNoR1FVb5CMexWyXDH0cbSD6TvJOaQ4jgH0ansZ5Xe0V9U5Mp6EQ+L2n701XzNTLE67aZZtvrAorAa/ucX9mgR0JyMdpMd6FkUhgSCPQ1vQXcRG59ZWxEnoavFtjy8+KJl4zPDptEZjQxLbi4jdeUj3kOewU2x4TuMLohT//NCFZNTpS3Dl5/2TrXt9lAbDCTQhWWe463j+pSjGH5jpfssywQX+TwJZPdPYE1WY9T8A3+I7+ZI6tlRFWrSQ7wC2JjW7ac844sSsXyADodRPFt5skSypf21lFzfS4o+fLqv0Ve3tZ4MS6eskRkktAHsoAk9e1wr2zQ7olvnjvd5Gaytb4sn85N8nGFe3jJIB8MWthNdLNeQmNrejCEQDefzlwfA3ErhaBLkGyYCZUhSyBfAwtvLtRYeqomNCySgTm4/QE702GgyY6zNskOz2bF883soNvbTqe4yTvA bhofYdvu 9W3vouyE09x0+E6xCI6TnPutqBzGsYHcFHOTHdVR1a0OuIYZhjHTEiOjZJj1fRyz4NzOql5jU08i5LrD/DRm/pIF1MQLgdKLzQ6Che7orzgU2P4htzcJwWT92DT3QlpX4+ATFBx6kPXm3BuI52mYa8vFlVKlcagLEaTpN5ojc5Ep9EiTphXGNGoFpCaD3ZULEdmvlQ92P0jH7XuTpRGrrEzgr8kMbZi3pEnnq65/DTfhT+hgGpUFDolhaDsUqo73xMgiWRRWg05Ze98uYjBL4DmNVet5aGtgUYgoQVsOe2prr9D6wOBI90UEGB8DKK+XQ8e3ZbAfihBGyZOXmOmxTHibVie6Se4D+Fhvzi3Ni9BiXgXtklyPsIN9xr+Td0iarVq+esCpU1X0osB+V/Fkhi4FJulMm/Rh9O8xL57sTdsh2TQshvZmPjyyaNa7M2M0k/fA6zzUVD1WPYfjzO0nD3Zpaxip6gVUBl+m1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This series is a follow-up to [1], which adds mTHP support to khugepaged. mTHP khugepaged support is a "loose" dependency for the sysfs/sysctl configs to make sense. Without it global="defer" and mTHP="inherit" case is "undefined" behavior. We've seen cases were customers switching from RHEL7 to RHEL8 see a significant increase in the memory footprint for the same workloads. Through our investigations we found that a large contributing factor to the increase in RSS was an increase in THP usage. For workloads like MySQL, or when using allocators like jemalloc, it is often recommended to set /transparent_hugepages/enabled=never. This is in part due to performance degradations and increased memory waste. This series introduces enabled=defer, this setting acts as a middle ground between always and madvise. If the mapping is MADV_HUGEPAGE, the page fault handler will act normally, making a hugepage if possible. If the allocation is not MADV_HUGEPAGE, then the page fault handler will default to the base size allocation. The caveat is that khugepaged can still operate on pages that are not MADV_HUGEPAGE. This allows for three things... one, applications specifically designed to use hugepages will get them, and two, applications that don't use hugepages can still benefit from them without aggressively inserting THPs at every possible chance. This curbs the memory waste, and defers the use of hugepages to khugepaged. Khugepaged can then scan the memory for eligible collapsing. Lastly there is the added benefit for those who want THPs but experience higher latency PFs. Now you can get base page performance at the PF handler and Hugepage performance for those mappings after they collapse. Admins may want to lower max_ptes_none, if not, khugepaged may aggressively collapse single allocations into hugepages. TESTING: - Built for x86_64, aarch64, ppc64le, and s390x - selftests mm - In [1] I provided a script [2] that has multiple access patterns - lots of general use. - redis testing. This test was my original case for the defer mode. What I was able to prove was that THP=always leads to increased max_latency cases; hence why it is recommended to disable THPs for redis servers. However with 'defer' we dont have the max_latency spikes and can still get the system to utilize THPs. I further tested this with the mTHP defer setting and found that redis (and probably other jmalloc users) can utilize THPs via defer (+mTHP defer) without a large latency penalty and some potential gains. I uploaded some mmtest results here[3] which compares: stock+thp=never stock+(m)thp=always khugepaged-mthp + defer (max_ptes_none=64) The results show that (m)THPs can cause some throughput regression in some cases, but also has gains in other cases. The mTHP+defer results have more gains and less losses over the (m)THP=always case. V5 Changes: - rebased dependent series - added reviewed-by tag on 2/4 V4 Changes: - Minor Documentation fixes - rebased the dependent series [1] onto mm-unstable commit 0e68b850b1d3 ("vmalloc: use atomic_long_add_return_relaxed()") V3 Changes: - Combined the documentation commits into one, and moved a section to the khugepaged mthp patchset V2 Changes: - base changes on mTHP khugepaged support - Fix selftests parsing issue - add mTHP defer option - add mTHP defer Documentation [1] - https://lore.kernel.org/lkml/20250428181218.85925-1-npache@redhat.com/ [2] - https://gitlab.com/npache/khugepaged_mthp_test [3] - https://people.redhat.com/npache/mthp_khugepaged_defer/testoutput2/output.html Nico Pache (4): mm: defer THP insertion to khugepaged mm: document (m)THP defer usage khugepaged: add defer option to mTHP options selftests: mm: add defer to thp setting parser Documentation/admin-guide/mm/transhuge.rst | 31 +++++++--- include/linux/huge_mm.h | 18 +++++- mm/huge_memory.c | 69 +++++++++++++++++++--- mm/khugepaged.c | 8 +-- tools/testing/selftests/mm/thp_settings.c | 1 + tools/testing/selftests/mm/thp_settings.h | 1 + 6 files changed, 106 insertions(+), 22 deletions(-) -- 2.48.1