From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14327C001B0 for ; Thu, 10 Aug 2023 15:13:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80F466B0071; Thu, 10 Aug 2023 11:13:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 798886B0078; Thu, 10 Aug 2023 11:13:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 639736B007B; Thu, 10 Aug 2023 11:13:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 519306B0071 for ; Thu, 10 Aug 2023 11:13:47 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E579AC0A3A for ; Thu, 10 Aug 2023 15:13:46 +0000 (UTC) X-FDA: 81108539652.05.CF859BF Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf29.hostedemail.com (Postfix) with ESMTP id 02EE2120020 for ; Thu, 10 Aug 2023 15:13:43 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf29.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691680424; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jhboJ/N4/sR3Z4pGJ+IyWmrE9sKN+9qa9UKIQvwW1j0=; b=Zqh75ng22mxC/47eaMWmBkqVt9Yhkd2ZdWDDtNSGZSLc8YQYVL1t0jREiDX4P30U2uLDFA 1ZFOfnyOq0/McOdZ6na9Upmu/4Cm5O31CAtimnJOe2Am4sxvFr9RSJaaScaxjz7k2NVc9q zu4TsF8ziKfxZcDUlCJyi/1z+NDGeMo= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf29.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691680424; a=rsa-sha256; cv=none; b=dTBeiZ/Q3k2ocGCroq4/PdW7ztQNmK0hkiLACFO2wcxCXNXWyw8yJm/fI4dXfRk3oHN8jm e/X8zA6dSG7pXRioinCdVUV2/yNpU51bbD3vukxaZebXAQosRbXmY5N6Zly6WYRQNVu+rb nx/qCD+mr/Cd6a/7RrjGUW4ODYPSh5c= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9D467D75; Thu, 10 Aug 2023 08:14:25 -0700 (PDT) Received: from [10.1.27.169] (XHFQ2J9959.cambridge.arm.com [10.1.27.169]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0668B3F64C; Thu, 10 Aug 2023 08:13:40 -0700 (PDT) Message-ID: Date: Thu, 10 Aug 2023 16:13:39 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 0/5] variable-order, large folios for anonymous memory Content-Language: en-GB To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org References: <20230810142942.3169679-1-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: <20230810142942.3169679-1-ryan.roberts@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 02EE2120020 X-Stat-Signature: qqmocexi69ujaid536wx4fuhezd4bwru X-HE-Tag: 1691680423-51279 X-HE-Meta: U2FsdGVkX18huyaHL1FyN2esw901WUnfBl7afkjfksMYEkmlW9F/BrMF2tpWHcvoRH6BUWhNg6ZLWBbY9onyfB1ttPKC1NuFLy7uJiNuwv1iXWiMogb41nBjdl4HZRILpKwyWSHhO9u4VxxpfxgZbXlfc1zFMMOw4zmQkI0r6UqeP62a5n4zcSGU23xsxqfDFZuAj5acwitXxUfC7fRZ7CwceTFmoVilBSX5y/zcG7pFG2wZsOet85wfc5yInGQhhwC8J9iSTuFAUi5hvAQ83Z330wU1wJ8/HLoUMhS4NbY0lJieK45Zi4hJdDL2PPe1cZY1gaEknQ/sWGXLs08iNvN1foC9tDMAuD6bLVUtTpCZ+dj3BLQJ7IxjIBRlQxeAIovhttFHm9QbIJ2dQxfZrbWmE7b+onCAa8Bdiu6nekIdsNBDqb+E0YIZRXROlqpJ7OI1Yo9BvuyVMQFw5UsCRPezg2ST2SLci3eTWT9dQ4YrYHFZpHABeDHtM/n7xe2vYoKJCtTNhX9ApG7y4AmRDbCwW0i9yC3xdYz1LBL9QKDLKP9j2TjrmwGTW8vZdDcqVZ+v5F0c8f65HxKEyifAxA9dLBEhVZKOOnVij3QsMgiC0arIp0DeRt/qebbM9Yi2/ZJEOXy36c58h72K0Uaou0BctThyYBaCSsyiydEQXL+dDanb4UTLBlpj52mS0F9EpzA/wO1MSr6LdhuGPTkC6RvIl5QPQrXz++ZO9w4B/DLS5TmmPN4iqj8Faia4xB49mBvSa3r920e2NQ8A4TiqXydzAc/CdZni5ks2GIEZG2t4+AubxCd3CZg3eB1e+OX3FQCWpNtuebkaywLj0FWGvq/kjDJsht7pMNmG+SQnmA2+EVGoUalk8DARMZk7s1jyCqQ/k3u5B+La9/P03Qu+1OlIGiN4+p1GIvCeDw6ZCcfEXcUnr4I2n7MfUxx5WgoNvGB3oKDt96/fjIY9g+7 2SnqZIM2 WZttU7T0QP1HAYx3iwuZxiJdlGJsdkR1geVqrgqPtHrfLYWMDpc1qj0rr2Tqb7tHXedu38ThHgzNjOOMNygZ5r98T7p1UHEA31w6VUjozrl3M6tSjnPVfffgXJ4wWsOjGWPgV5Yqn4KkRuvG5Pq2CXTrO6b4jW10pESMdfO8ujksBetnaTFE97KFrW9mDEYHCDoUh7CIzJSJtW7bkux/bkHJucpl7Tvnf7FxEMWSXEA7giQnLGsbE8eLx7uLaiUAZCZXS1YiQCaOROypaKrO/BQ/iqETXUaoG9uZECaK7ZANXl7Wuf8p6WDl3nlnfaZ/78cvo9vtPmuEci5FeyVtFfTNQKA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/08/2023 15:29, Ryan Roberts wrote: > Hi All, > > This is v5 of a series to implement variable order, large folios for anonymous > memory. (currently called "LARGE_ANON_FOLIO", previously called "FLEXIBLE_THP"). > The objective of this is to improve performance by allocating larger chunks of > memory during anonymous page faults: > > 1) Since SW (the kernel) is dealing with larger chunks of memory than base > pages, there are efficiency savings to be had; fewer page faults, batched PTE > and RMAP manipulation, reduced lru list, etc. In short, we reduce kernel > overhead. This should benefit all architectures. > 2) Since we are now mapping physically contiguous chunks of memory, we can take > advantage of HW TLB compression techniques. A reduction in TLB pressure > speeds up kernel and user space. arm64 systems have 2 mechanisms to coalesce > TLB entries; "the contiguous bit" (architectural) and HPA (uarch). > > This patch set deals with the SW side of things (1). (2) is being tackled in a > separate series. The new behaviour is hidden behind a new Kconfig switch, > LARGE_ANON_FOLIO, which is disabled by default. Although the eventual aim is to > enable it by default. > > My hope is that we are pretty much there with the changes at this point; > hopefully this is sufficient to get an initial version merged so that we can > scale up characterization efforts. Although they should not be merged until the > prerequisites are complete. These are in progress and tracked at [5]. > > This series is based on mm-unstable (ad3232df3e41). > > I'm going to be out on holiday from the end of today, returning on 29th > August. So responses will likely be patchy, as I'm terrified of posting > to list from my phone! > > > Testing > ------- > > This version adds patches to mm selftests so that the cow tests explicitly test > large anon folios, in the same way that thp is tested. When enabled you should > see something similar at the start of the test suite: > > # [INFO] detected large anon folio size: 32 KiB > > Then the following results are expected. The fails and skips are due to existing > issues in mm-unstable: > > # Totals: pass:207 fail:16 xfail:0 xpass:0 skip:85 error:0 Oops, the above are the results when running with SWAP disabled. This is what you would normally see when SWAP is enabled: # Totals: pass:291 fail:16 xfail:0 xpass:0 skip:1 error:0 > > Existing mm selftests reveal 1 regression in khugepaged tests when > LARGE_ANON_FOLIO is enabled: > > Run test: collapse_max_ptes_none (khugepaged:anon) > Maybe collapse with max_ptes_none exceeded.... Fail > Unexpected huge page > > I believe this is because khugepaged currently skips non-order-0 pages when > looking for collapse opportunities and should get fixed with the help of > DavidH's work to create a mechanism to precisely determine shared vs exclusive > pages. > > > Changes since v4 [4] > -------------------- > > - Removed "arm64: mm: Override arch_wants_pte_order()" patch; arm64 > now uses the default order-3 size. I have moved this patch over to > the contpte series. > - Added "mm: Allow deferred splitting of arbitrary large anon folios" back > into series. I originally removed this at v2 to add to a separate series, > but that series has transformed significantly and it no longer fits, so > bringing it back here. > - Reintroduced dependency on set_ptes(); Originally dropped this at v2, but > set_ptes() is in mm-unstable now. > - Updated policy for when to allocate LAF; only fallback to order-0 if > MADV_NOHUGEPAGE is present or if THP disabled via prctl; no longer rely on > sysfs's never/madvise/always knob. > - Fallback to order-0 whenever uffd is armed for the vma, not just when > uffd-wp is set on the pte. > - alloc_anon_folio() now returns `strucxt folio *`, where errors are encoded > with ERR_PTR(). > > The last 3 changes were proposed by Yu Zhao - thanks! > > > Changes since v3 [3] > -------------------- > > - Renamed feature from FLEXIBLE_THP to LARGE_ANON_FOLIO. > - Removed `flexthp_unhinted_max` boot parameter. Discussion concluded that a > sysctl is preferable but we will wait until real workload needs it. > - Fixed uninitialized `addr` on read fault path in do_anonymous_page(). > - Added mm selftests for large anon folios in cow test suite. > > > Changes since v2 [2] > -------------------- > > - Dropped commit "Allow deferred splitting of arbitrary large anon folios" > - Huang, Ying suggested the "batch zap" work (which I dropped from this > series after v1) is a prerequisite for merging FLXEIBLE_THP, so I've > moved the deferred split patch to a separate series along with the batch > zap changes. I plan to submit this series early next week. > - Changed folio order fallback policy > - We no longer iterate from preferred to 0 looking for acceptable policy > - Instead we iterate through preferred, PAGE_ALLOC_COSTLY_ORDER and 0 only > - Removed vma parameter from arch_wants_pte_order() > - Added command line parameter `flexthp_unhinted_max` > - clamps preferred order when vma hasn't explicitly opted-in to THP > - Never allocate large folio for MADV_NOHUGEPAGE vma (or when THP is disabled > for process or system). > - Simplified implementation and integration with do_anonymous_page() > - Removed dependency on set_ptes() > > > Changes since v1 [1] > -------------------- > > - removed changes to arch-dependent vma_alloc_zeroed_movable_folio() > - replaced with arch-independent alloc_anon_folio() > - follows THP allocation approach > - no longer retry with intermediate orders if allocation fails > - fallback directly to order-0 > - remove folio_add_new_anon_rmap_range() patch > - instead add its new functionality to folio_add_new_anon_rmap() > - remove batch-zap pte mappings optimization patch > - remove enabler folio_remove_rmap_range() patch too > - These offer real perf improvement so will submit separately > - simplify Kconfig > - single FLEXIBLE_THP option, which is independent of arch > - depends on TRANSPARENT_HUGEPAGE > - when enabled default to max anon folio size of 64K unless arch > explicitly overrides > - simplify changes to do_anonymous_page(): > - no more retry loop > > > [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-1-ryan.roberts@arm.com/ > [2] https://lore.kernel.org/linux-mm/20230703135330.1865927-1-ryan.roberts@arm.com/ > [3] https://lore.kernel.org/linux-mm/20230714160407.4142030-1-ryan.roberts@arm.com/ > [4] https://lore.kernel.org/linux-mm/20230726095146.2826796-1-ryan.roberts@arm.com/ > [5] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/ > > > Thanks, > Ryan > > Ryan Roberts (5): > mm: Allow deferred splitting of arbitrary large anon folios > mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() > mm: LARGE_ANON_FOLIO for improved performance > selftests/mm/cow: Generalize do_run_with_thp() helper > selftests/mm/cow: Add large anon folio tests > > include/linux/pgtable.h | 13 ++ > mm/Kconfig | 10 ++ > mm/memory.c | 144 +++++++++++++++++-- > mm/rmap.c | 31 +++-- > tools/testing/selftests/mm/cow.c | 229 ++++++++++++++++++++++--------- > 5 files changed, 347 insertions(+), 80 deletions(-) > > -- > 2.25.1 >