From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E501C4167B for ; Mon, 4 Dec 2023 19:53:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 408A56B02FA; Mon, 4 Dec 2023 14:53:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B86B6B02FD; Mon, 4 Dec 2023 14:53:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 281406B02FF; Mon, 4 Dec 2023 14:53:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 19D1A6B02FA for ; Mon, 4 Dec 2023 14:53:08 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E32B51A036D for ; Mon, 4 Dec 2023 19:53:07 +0000 (UTC) X-FDA: 81530184414.23.27851A1 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf25.hostedemail.com (Postfix) with ESMTP id B8251A0012 for ; Mon, 4 Dec 2023 19:53:05 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf25.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701719586; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qF2oKqzsf5/wJqnpa3mPdKSqtxi/xrErp9ZIRGWCIEQ=; b=UyYcUQBDysNJIWB5ZXswpCIrHpTIxJ4uvxDlF/BqrI7ZhROFmXVKN8u2uoBMRdOAExjesy mDDWewLg2BHl3e2iUPQf0K/6XXvNcF3F1rdFGMz6XH7q26JTu5ztK85Q1TlT41FyoprMkn YSR0vE2xokC9z50gneHhXe2jIGxiYTQ= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf25.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701719586; a=rsa-sha256; cv=none; b=lvlMX2qY95QCbYZm025e6UAlDolUcWeV1KglWgGyrZE4xjPRgkT1bJlPlr5x9mjVp04jLM ohqzmvehh0Zr4AROmoPBmMeO2pQ7PzbkqfJRMs4KCPdcUVnTtfO8LPtzVJDQV4ySD+ifFO 2NeitzlZ/pDCSqYexyKQAdJd3dW6z6w= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9FED91684; Mon, 4 Dec 2023 11:53:51 -0800 (PST) Received: from [10.57.73.130] (unknown [10.57.73.130]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1DB4B3F5A1; Mon, 4 Dec 2023 11:53:02 -0800 (PST) Message-ID: <993ea322-8cdb-4ab1-84d3-0a1cb40049c9@arm.com> Date: Mon, 4 Dec 2023 19:53:01 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC 00/39] mm/rmap: interface overhaul Content-Language: en-GB To: David Hildenbrand , linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Andrew Morton , "Matthew Wilcox (Oracle)" , Hugh Dickins , Yin Fengwei , Mike Kravetz , Muchun Song , Peter Xu References: <20231204142146.91437-1-david@redhat.com> From: Ryan Roberts In-Reply-To: <20231204142146.91437-1-david@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: B8251A0012 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: g9iegmh1bo7tzywmejhtysjdthgg1h1k X-HE-Tag: 1701719585-692265 X-HE-Meta: U2FsdGVkX1/JNUdngLpeRHxSWlkl7NbOl4rXBvjfojLPLr/zwS+5Nxq8pA3yYoEVYQz7Fxvk3LRbBzvvczQPxXBXg7gzLoH6zXn9V/Z3Wz0CXTbbvUkmDsCu6Vf3hvJWioWKEOF+OvWpgwO4LymJy3RWVmYKeQNZC6DoBuZFg23fJcxn1eNoRAXACa/o1Oyroe5O6IFkXvHF3HgejfuKG0ag4eGsmoh3akzyxN4pI3ObmN7oBLbwkE7e3oLxhzZkS/QftvBby7q/OSvMMGNSfnzZydnLWrIWaPfsk5aUWkRZZ4sFeAH7AEk3LDLo4DxTuMl5+aD3dix3gZI+aduJ/Kew/QTgdt98RITXkOq1KsNQ5FpHkCBvJ+FKSAZt2NLJuaxjhC2dEJLlVqVn64HOlOMS35tloTaEA0HDZiaqc4NJN4oaWO/IADuXibwRNlxw/jFiev39RRIfG4scScdXjSEAyiqjtem/SKwURGAJYABvcOBAmtaNuDnirXMPeHti76iuITdW4cMCnkRGGE1Elb/P907pJ4NWMKI4hitBuwP02FGjs399M9CnRpZ5FpkUWOjPuCbeSXgLl9kWJVBm6OKEK6eYgT25avz7K+h7XM+lhsKxD8MIqOTffCrcTBul3XiMkeDw4M0bKjLHdI96My6XqaUy7pEaM/IUTXglscNsnhjPGiovq5TBqhKmGQhPnu56h+I4M1YJFO45dp/iiMYYe2rOpj1zeXH+bAkE1Au2o5MpY0huK6KOQrXusoy/zfgicW3EqZzIfeiwyUorhYgocfMRzUcwmMVm0e5Mo66YEfypB99K9YSSgNS9g1yDzKDWMnl9Ds+11GGSZ9Cc9jSVENCROs05/edfH3fCZdL4YgwL3oTl4+qCaiXmDScdgDR8MP5HdfaCVTewFb708jv5Ixb3pxhG5hTs4GXko6IlLApRYtHykk2lMKu7LZQdFi5Ew3+pA2zklkkkueY 9U/12zLu eR3bZtYB4ZFbVunav1gBBxH+Tb8qZhVDUvgO+lm1+41jr/qJduJa/25mu2ULBpIALzEdib4nIbPbc0L5z+viXXhwjaeCpretiOVgllK1WRuKEe2M0PzEHbgigxne1r26bldOihYYHt046MaCh0C5e8UWPcgAxIamYIFfgAQcXolawXMaVEZPWnumu5QvOHNUb74V/yoqZKAS9bkh3cviYvkZ45dkCZhpJGEYE6HOcUGaQplgfjnmev4rRtqN8XITrG0eVl6FDCuLzlIK6BlVEDX61wvrGzhsOEkCzaLC87C2cxdqyAosVZDaxJnsHsIlOF3JYREhrQ41a+H9522t0/0DyXXS8JQ4hLTi0snHc6vaSNzALs3UvFjnB0eQOnk3ULHJXY7kxFX04XhpbdTm3OidV1hHH6UBPCV2vnpdnvf1wa33/DEoSIS1SXIEzfbIDTKP6r28BAqs4oOlEY/INWnOqn+M8rzaxy6CY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 04/12/2023 14:21, David Hildenbrand wrote: > Baed on mm-stable from a couple of days. > > This series proposes an overhaul to our rmap interface, to get rid of the > "bool compound" / RMAP_COMPOUND parameter with the goal of making the > interface less error prone, more future proof, and more natural to extend > to "batching". Also, this converts the interface to always consume > folio+subpage, which speeds up operations on large folios. > > Further, this series adds PTE-batching variants for 4 rmap functions, > whereby only folio_add_anon_rmap_ptes() is used for batching in this series > when PTE-remapping a PMD-mapped THP. I certainly support the objective you have here; making the interfaces clearer, more consistent and more amenable to batching. I'll try to find some time this week to review. > > Ryan has series where we would make use of folio_remove_rmap_ptes() [1] > -- he carries his own batching variant right now -- and > folio_try_dup_anon_rmap_ptes()/folio_dup_file_rmap_ptes() [2]. Note that the contpte series at [2] has a new patch in v3 (patch 2), which could benefit from folio_remove_rmap_ptes() or equivalent. My plan was to revive [1] on top of [2] once it is merged. > > There is some overlap with both series (and some other work, like > multi-size THP [3]), so that will need some coordination, and likely a > stepwise inclusion. Selfishly, I'd really like to get my stuff merged as soon as there is no technical reason not to. I'd prefer not to add this as a dependency if we can help it. > > I got that started [4], but it made sense to show the whole picture. The > patches of [4] are contained in here, with one additional patch added > ("mm/rmap: introduce and use hugetlb_try_share_anon_rmap()") and some > slight patch description changes. > > In general, RMAP batching is an important optimization for PTE-mapped > THP, especially once we want to move towards a total mapcount or further, > as shown with my WIP patches on "mapped shared vs. mapped exclusively" [5]. > > The rmap batching part of [5] is also contained here in a slightly reworked > fork [and I found a bug du to the "compound" parameter handling in these > patches that should be fixed here :) ]. > > This series performs a lot of folio conversion, that could be separated > if there is a good reason. Most of the added LOC in the diff are only due > to documentation. > > As we're moving to a pte/pmd interface where we clearly express the > mapping granularity we are dealing with, we first get the remainder of > hugetlb out of the way, as it is special and expected to remain special: it > treats everything as a "single logical PTE" and only currently allows > entire mappings. > > Even if we'd ever support partial mappings, I strongly > assume the interface and implementation will still differ heavily: > hopefull we can avoid working on subpages/subpage mapcounts completely and > only add a "count" parameter for them to enable batching. > > > New (extended) hugetlb interface that operate on entire folio: > * hugetlb_add_new_anon_rmap() -> Already existed > * hugetlb_add_anon_rmap() -> Already existed > * hugetlb_try_dup_anon_rmap() > * hugetlb_try_share_anon_rmap() > * hugetlb_add_file_rmap() > * hugetlb_remove_rmap() > > New "ordinary" interface for small folios / THP:: > * folio_add_new_anon_rmap() -> Already existed > * folio_add_anon_rmap_[pte|ptes|pmd]() > * folio_try_dup_anon_rmap_[pte|ptes|pmd]() > * folio_try_share_anon_rmap_[pte|pmd]() > * folio_add_file_rmap_[pte|ptes|pmd]() > * folio_dup_file_rmap_[pte|ptes|pmd]() > * folio_remove_rmap_[pte|ptes|pmd]() I'm not sure if there are official guidelines, but personally if we are reworking the API, I'd take the opportunity to move "rmap" to the front of the name, rather than having it burried in the middle as it is for some of these: rmap_hugetlb_*() rmap_folio_*() I guess reading the patches will tell me, but what's the point of "ptes"? Surely you're either mapping at pte or pmd level, and the number of pages is determined by the folio size? (or presumably nr param passed in) Thanks, Ryan > > folio_add_new_anon_rmap() will always map at the biggest granularity > possible (currently, a single PMD to cover a PMD-sized THP). Could be > extended if ever required. > > In the future, we might want "_pud" variants and eventually "_pmds" variants > for batching. Further, if hugepd is ever a thing outside hugetlb code, > we might want some variants for that. All stuff for the distant future. > > > I ran some simple microbenchmarks from [5] on an Intel(R) Xeon(R) Silver > 4210R: munmap(), fork(), cow, MADV_DONTNEED on each PTE ... and PTE > remapping PMD-mapped THPs on 1 GiB of memory. > > For small folios, there is barely a change (< 1 % performance improvement), > whereby fork() still stands out with 0.74% performance improvement, but > it might be just some noise. Folio optimizations don't help that much > with small folios. > > For PTE-mapped THP: > * PTE-remapping a PMD-mapped THP is more than 10% faster. > -> RMAP batching > * fork() is more than 4% faster. > -> folio conversion > * MADV_DONTNEED is 2% faster > -> folio conversion > * COW by writing only a single byte on a COW-shared PTE > -> folio conversion > * munmap() is only slightly faster (< 1%). > > [1] https://lkml.kernel.org/r/20230810103332.3062143-1-ryan.roberts@arm.com > [2] https://lkml.kernel.org/r/20231204105440.61448-1-ryan.roberts@arm.com > [3] https://lkml.kernel.org/r/20231204102027.57185-1-ryan.roberts@arm.com > [4] https://lkml.kernel.org/r/20231128145205.215026-1-david@redhat.com > [5] https://lkml.kernel.org/r/20231124132626.235350-1-david@redhat.com > > Cc: Andrew Morton > Cc: "Matthew Wilcox (Oracle)" > Cc: Hugh Dickins > Cc: Ryan Roberts > Cc: Yin Fengwei > Cc: Mike Kravetz > Cc: Muchun Song > Cc: Peter Xu > > David Hildenbrand (39): > mm/rmap: rename hugepage_add* to hugetlb_add* > mm/rmap: introduce and use hugetlb_remove_rmap() > mm/rmap: introduce and use hugetlb_add_file_rmap() > mm/rmap: introduce and use hugetlb_try_dup_anon_rmap() > mm/rmap: introduce and use hugetlb_try_share_anon_rmap() > mm/rmap: add hugetlb sanity checks > mm/rmap: convert folio_add_file_rmap_range() into > folio_add_file_rmap_[pte|ptes|pmd]() > mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]() > mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd() > mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte() > mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte() > mm/rmap: remove page_add_file_rmap() > mm/rmap: factor out adding folio mappings into __folio_add_rmap() > mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]() > mm/huge_memory: batch rmap operations in __split_huge_pmd_locked() > mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd() > mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte() > mm/ksm: page_add_anon_rmap() -> folio_add_anon_rmap_pte() > mm/swapfile: page_add_anon_rmap() -> folio_add_anon_rmap_pte() > mm/memory: page_add_anon_rmap() -> folio_add_anon_rmap_pte() > mm/rmap: remove page_add_anon_rmap() > mm/rmap: remove RMAP_COMPOUND > mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]() > kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte() > mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd() > mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte() > mm/ksm: page_remove_rmap() -> folio_remove_rmap_pte() > mm/memory: page_remove_rmap() -> folio_remove_rmap_pte() > mm/migrate_device: page_remove_rmap() -> folio_remove_rmap_pte() > mm/rmap: page_remove_rmap() -> folio_remove_rmap_pte() > Documentation: stop referring to page_remove_rmap() > mm/rmap: remove page_remove_rmap() > mm/rmap: convert page_dup_file_rmap() to > folio_dup_file_rmap_[pte|ptes|pmd]() > mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]() > mm/huge_memory: page_try_dup_anon_rmap() -> > folio_try_dup_anon_rmap_pmd() > mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte() > mm/rmap: remove page_try_dup_anon_rmap() > mm: convert page_try_share_anon_rmap() to > folio_try_share_anon_rmap_[pte|pmd]() > mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED > > Documentation/mm/transhuge.rst | 4 +- > Documentation/mm/unevictable-lru.rst | 4 +- > include/linux/mm.h | 6 +- > include/linux/rmap.h | 380 +++++++++++++++++++----- > kernel/events/uprobes.c | 2 +- > mm/gup.c | 2 +- > mm/huge_memory.c | 85 +++--- > mm/hugetlb.c | 21 +- > mm/internal.h | 12 +- > mm/khugepaged.c | 17 +- > mm/ksm.c | 15 +- > mm/memory-failure.c | 4 +- > mm/memory.c | 60 ++-- > mm/migrate.c | 12 +- > mm/migrate_device.c | 41 +-- > mm/mmu_gather.c | 2 +- > mm/rmap.c | 422 ++++++++++++++++----------- > mm/swapfile.c | 2 +- > mm/userfaultfd.c | 2 +- > 19 files changed, 709 insertions(+), 384 deletions(-) >