From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E85EC3ABC3 for ; Thu, 15 May 2025 03:21:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CAAD76B00CB; Wed, 14 May 2025 23:21:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C34626B00D1; Wed, 14 May 2025 23:21:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AAB796B00DC; Wed, 14 May 2025 23:21:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8932F6B00CB for ; Wed, 14 May 2025 23:21:45 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 24C9C1D0D04 for ; Thu, 15 May 2025 03:21:46 +0000 (UTC) X-FDA: 83443692612.16.031F00D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf12.hostedemail.com (Postfix) with ESMTP id E569940003 for ; Thu, 15 May 2025 03:21:43 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=N71Anwui; spf=pass (imf12.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747279304; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fTeV/WO8sCZkiFkU9JmxKppZ1kmH8Wa3R30xrqm9r44=; b=MigVphOBbHTH0AXutx2qjFtsw/wqBqhDKmXXC/mPvjo/a9W/dWAQtyo4lBJFfIziPb3lOX t5pEgjpDqdP0ZSX5MyTHIKauwCMyw+ZE2EkOza6p3q0RnxFXGzqN+4A3JxSpZGDX6Jgk0o r7ISQzD0w2x5qNwujhX0ZP8KqPc/2Y8= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=N71Anwui; spf=pass (imf12.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747279304; a=rsa-sha256; cv=none; b=FlzqCCS2nFY2UUxk4IMqTPxpA6bHU8xcI9G+f7qukBRR6fkMtUytQHVyOUjC4IDXWrRI4X Xnp/hFuhmjVmlMpP7zA32PsnZV3I4TDYkOLxw7cEecfGqHambXBwTRrrA4e3xfkHyokwr0 +x8rfm+xUiESS6GYHD+2CoMGHAtg9rc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1747279303; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fTeV/WO8sCZkiFkU9JmxKppZ1kmH8Wa3R30xrqm9r44=; b=N71AnwuiIN04VRSqGWYaaxpzhzfALjKJ7PePfiJyylx2cn/BadbajSH1+q+AdPubj0oIDC FZSrdCXyV5FVVmwSkMs4tsJ40lg7UtaCdL7YN6VuBdh54Q1+tShzrEJA9f5sa0C7VnBeEv REIPfu7yO5Y3O9uwB1R4NpyzPwmfgGs= Received: from mail-yw1-f199.google.com (mail-yw1-f199.google.com [209.85.128.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-632-cFHjSf4wNGeiv0qASjAfmQ-1; Wed, 14 May 2025 23:21:42 -0400 X-MC-Unique: cFHjSf4wNGeiv0qASjAfmQ-1 X-Mimecast-MFC-AGG-ID: cFHjSf4wNGeiv0qASjAfmQ_1747279302 Received: by mail-yw1-f199.google.com with SMTP id 00721157ae682-707081ff926so6783657b3.0 for ; Wed, 14 May 2025 20:21:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747279301; x=1747884101; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fTeV/WO8sCZkiFkU9JmxKppZ1kmH8Wa3R30xrqm9r44=; b=rGDBQYsSzuyhxnBBOpjOfldbLUUDAHjmFexETuXPsaWP84Pskq7OESVaGCkPS5bM9O TqRJku35EMky0+xsu3Z/TYPIp8FlwWeZ9UxqifFdDuihMH46iIrZat6p2NygH28w9at4 6cJfc5NLcdaVMzOavUgcsGe19WM3Jwk+ib5Gs/jhPDB+tEDs2MfTAQnw1rkWGex0twi6 PiCo8Y0PlwlLkiTExAC6LZdXKfoDqzQGY6QobmkRlBDNFdVVewW1ckekZLsrgfeP371S Nvno6RSzO80T5tx+yshop+ML5J/hdq8lim+FpmTP5M09MoshmH3/scgO6FpUQEyMOpdC IJNQ== X-Gm-Message-State: AOJu0YzJbahKI55jZnDOkA8GovMgJakLw9K1vDDnDnqsxLyLYdvLV0BP sjvtCq2OoiB1PC/cU867T0NjfkhOxcHsd1faYj+xoXMR4uiHwk8KrlAWyRgIPaIXekP/Ser11zc g9rjtTqpcgZ316dQEfoRpJOix1eMFOkkAvvy7yzr8ZAstUXNZ5A7M/NcKn6jmS6mZRVdlGjiIqR 9WTeSc06bTwwHBnBMBQwDVj3kVrccZlyc9AkhUAUyOUQ== X-Gm-Gg: ASbGncs+Nsrl5lnE2MqQYUz8mVTRUfl1EAOA3OWOGGH3UsDHUXWh9IYzNFffEYYYI2T IaSqFuGBwihMypmMUeymyWrH8OoXO5qOhurgV/tfAhejHxcDoLOluBWoV0LZkzlWWbIwfg5s= X-Received: by 2002:a05:690c:6e0e:b0:708:3a47:3d2c with SMTP id 00721157ae682-70c7f12c8famr94768977b3.13.1747279301410; Wed, 14 May 2025 20:21:41 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFM4mZxig5/05EWPU+CFmocywD5iXORxl4Y6pxrnOlBmTEaXyhJqWFviIYIj7G5V/QV7i7XgKaqOB+7kYcteo0= X-Received: by 2002:a05:690c:6e0e:b0:708:3a47:3d2c with SMTP id 00721157ae682-70c7f12c8famr94768327b3.13.1747279300939; Wed, 14 May 2025 20:21:40 -0700 (PDT) MIME-Version: 1.0 References: <20250515030312.125567-1-npache@redhat.com> In-Reply-To: <20250515030312.125567-1-npache@redhat.com> From: Nico Pache Date: Wed, 14 May 2025 21:21:14 -0600 X-Gm-Features: AX0GCFuDBI2AES8eO7twjKdGMIMeHHi4i6ePTC9RxmHbpsPIlqmkhrIU3ijT4fQ Message-ID: Subject: Re: [PATCH v6 00/12] khugepaged: mTHP support To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: OsYh1laZdu94fc84aq-AK-3V7ZykcUio7Dgta4GCoec_1747279302 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Stat-Signature: jiaq6i3rcx6n5mju5638e1hykc4cw15j X-Rspamd-Queue-Id: E569940003 X-Rspam-User: X-HE-Tag: 1747279303-309728 X-HE-Meta: U2FsdGVkX19G+eb0Xkb//x6k1PXBVtTUetSdeABTUFUAH/WRcJIi9EKUhvYeQpS61oAfjPN7jGErla6+PVji4TtWj0Wa9mYG58kxl14hxH4qxAtj1/swv/Wqvo8zLMYDNzGaUBFjBIWEWyDGPcFSOb/H7vqxv5uylwI1zb0obq4fDv4EqyGgt6tNp5draq34czbbRkYcRDqiSFYZe8wsDEbxbDPeqxaWnmlUaqy82NFe9oXTJXIHShIHXjTDu3n6wTSJSlQVB2PerF7V43vjzR0oWwlFHLxVk9LtbwvVa7ZOSDBhud8XoxwNmnaFvG/pZsXDQZzVR0ZkZjn4UCaA702fGWgrkO8Jqppg+xFsXVUTbn4U3g50AFDL7w1ppCzPs+QNfVQ9jU8KuxJnVcHRuINhlrGQs1vhI/5HT7ZFz9/3T4oKysbRUPm6zKeBIqnEKADwH/FBgYMsHDMNhAp5ADwmCeQZqvTbGyTvUHannzs/SNbxcUDv/XtSIM0/GCjyz4VTFGhA28jaJBtLwak6HzUhnOudPxpzuK6Oz7kdUSGuBKGaHbZVinx0tHtaKFhPBuJdQIjpGCe9jbgl8f9iMYYiDEuvxhDwPr2uLi8MgRAaFQ3EdlXVZrdBQSwWtXyVZnW1K0rhOHMJmM2KGwo0X3kXtWBG4LI1HFxp9P5coqlOjLvQSshe9oHJ3DxOzKZiiyDjubsTWCg6dEJi+gdn2yGAUc6jnp3we9bjOYy0flWhfKQiZzrpvtVYfl6Bd9wJK+9NEYENZ6E0Z6WS0VR5yXdBL2hnqPTHoVcwsMm1WnRhIxavCcm8H28KpCKjGDFN14CbNynRsW6IZFIITYO6zAlFS/eLwvqyex/r21UDJ2Tviryb+irpyld9VcBqoqIBA7O0ZUR1TD7vVWqo4C2DatVMZcY8rV9E5kPSNGUSMQ04culWQla98kNU0YPxg9vs7QmRV44dDUBcOh3CCXS F4N1F8PF C9u7pbWhc8tsCsCh6+whLTCMHnw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Ugh... So sorry, I forgot to turn off the chain-reply-to. resending V7 *facepalm* On Wed, May 14, 2025 at 9:03=E2=80=AFPM Nico Pache wrot= e: > > The following series provides khugepaged and madvise collapse with the > capability to collapse anonymous memory regions to mTHPs. > > To achieve this we generalize the khugepaged functions to no longer depen= d > on PMD_ORDER. Then during the PMD scan, we keep track of chunks of pages > (defined by KHUGEPAGED_MTHP_MIN_ORDER) that are utilized. This info is > tracked using a bitmap. After the PMD scan is done, we do binary recursio= n > on the bitmap to find the optimal mTHP sizes for the PMD range. The > restriction on max_ptes_none is removed during the scan, to make sure we > account for the whole PMD range. When no mTHP size is enabled, the legacy > behavior of khugepaged is maintained. max_ptes_none will be scaled by the > attempted collapse order to determine how full a THP must be to be > eligible. If a mTHP collapse is attempted, but contains swapped out, or > shared pages, we dont perform the collapse. > > With the default max_ptes_none=3D511, the code should keep its most of it= s > original behavior. To exercise mTHP collapse we need to set > max_ptes_none<=3D255. With max_ptes_none > HPAGE_PMD_NR/2 you will > experience collapse "creep" and constantly promote mTHPs to the next > available size. This is due the fact that it will introduce at least 2x > the number of pages, and on a future scan will satisfy that condition onc= e > again. > > Patch 1: Refactor/rename hpage_collapse > Patch 2: Some refactoring to combine madvise_collapse and khugepaged > Patch 3-5: Generalize khugepaged functions for arbitrary orders > Patch 6-9: The mTHP patches > Patch 10-11: Tracing/stats > Patch 12: Documentation > > --------- > Testing > --------- > - Built for x86_64, aarch64, ppc64le, and s390x > - selftests mm > - I created a test script that I used to push khugepaged to its limits > while monitoring a number of stats and tracepoints. The code is > available here[1] (Run in legacy mode for these changes and set mthp > sizes to inherit) > The summary from my testings was that there was no significant > regression noticed through this test. In some cases my changes had > better collapse latencies, and was able to scan more pages in the same > amount of time/work, but for the most part the results were consistent= . > - redis testing. I tested these changes along with my defer changes > (see followup post for more details). > - some basic testing on 64k page size. > - lots of general use. > > V6 Changes: > - Dont release the anon_vma_lock early (like in the PMD case), as not all > pages are isolated. > - Define the PTE as null to avoid a uninitilized condition > - minor nits and newline cleanup > - make sure to unmap and unlock the pte for the swapin case > - change the revalidation to always check the PMD order (as this will mak= e > sure that no other VMA spans it) > > V5 Changes [2]: > - switched the order of patches 1 and 2 > - fixed some edge cases on the unified madvise_collapse and khugepaged > - Explained the "creep" some more in the docs > - fix EXCEED_SHARED vs EXCEED_SWAP accounting issue > - fix potential highmem issue caused by a early unmap of the PTE > > V4 Changes: > - Rebased onto mm-unstable > - small changes to Documentation > > V3 Changes: > - corrected legacy behavior for khugepaged and madvise_collapse > - added proper mTHP stat tracking > - Minor changes to prevent a nested lock on non-split-lock arches > - Took Devs version of alloc_charge_folio as it has the proper stats > - Skip cases were trying to collapse to a lower order would still fail > - Fixed cases were the bitmap was not being updated properly > - Moved Documentation update to this series instead of the defer set > - Minor bugs discovered during testing and review > - Minor "nit" cleanup > > V2 Changes: > - Minor bug fixes discovered during review and testing > - removed dynamic allocations for bitmaps, and made them stack based > - Adjusted bitmap offset from u8 to u16 to support 64k pagesize. > - Updated trace events to include collapsing order info. > - Scaled max_ptes_none by order rather than scaling to a 0-100 scale. > - No longer require a chunk to be fully utilized before setting the bit. > Use the same max_ptes_none scaling principle to achieve this. > - Skip mTHP collapse that requires swapin or shared handling. This helps > prevent some of the "creep" that was discovered in v1. > > [1] - https://gitlab.com/npache/khugepaged_mthp_test > [2] - https://lore.kernel.org/all/20250428181218.85925-1-npache@redhat.co= m/ > > Dev Jain (1): > khugepaged: generalize alloc_charge_folio() > > Nico Pache (11): > khugepaged: rename hpage_collapse_* to khugepaged_* > introduce khugepaged_collapse_single_pmd to unify khugepaged and > madvise_collapse > khugepaged: generalize hugepage_vma_revalidate for mTHP support > khugepaged: generalize __collapse_huge_page_* for mTHP support > khugepaged: introduce khugepaged_scan_bitmap for mTHP support > khugepaged: add mTHP support > khugepaged: skip collapsing mTHP to smaller orders > khugepaged: avoid unnecessary mTHP collapse attempts > khugepaged: improve tracepoints for mTHP orders > khugepaged: add per-order mTHP khugepaged stats > Documentation: mm: update the admin guide for mTHP collapse > > Documentation/admin-guide/mm/transhuge.rst | 14 +- > include/linux/huge_mm.h | 5 + > include/linux/khugepaged.h | 4 + > include/trace/events/huge_memory.h | 34 +- > mm/huge_memory.c | 11 + > mm/khugepaged.c | 472 ++++++++++++++------- > 6 files changed, 382 insertions(+), 158 deletions(-) > > -- > 2.49.0 >