From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3F2EC001B0 for ; Wed, 16 Aug 2023 08:11:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 487A78D0021; Wed, 16 Aug 2023 04:11:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4377B8D0001; Wed, 16 Aug 2023 04:11:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FEC98D0021; Wed, 16 Aug 2023 04:11:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1A5B98D0001 for ; Wed, 16 Aug 2023 04:11:25 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E144B80E57 for ; Wed, 16 Aug 2023 08:11:24 +0000 (UTC) X-FDA: 81129248088.09.77A9C87 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf04.hostedemail.com (Postfix) with ESMTP id 099034000A for ; Wed, 16 Aug 2023 08:11:22 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=rGrE2CTd; spf=pass (imf04.hostedemail.com: domain of itaru.kitayama@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=itaru.kitayama@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692173483; a=rsa-sha256; cv=none; b=8M6vJjT5sbDjxV7GGGr64Iu3VEsm3RsoFdWi8ohu1kyB+QEYcEpLf94oy3rMhrn8x3V9TA avAj6Zq5nAIVBQBFPsyXf8PME22EDmmm7Lx1NT97bOPwhQS4QFnt7tKkMYy+z1eL+oJaoR f+xaXRnB+fgEN50yl3CssSnfakqLJpE= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=rGrE2CTd; spf=pass (imf04.hostedemail.com: domain of itaru.kitayama@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=itaru.kitayama@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692173483; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2Ge3/AeXRJddBHQ7s/xkI1nEQCp8JZ9ysjNq/9S8Le0=; b=mHZkbY/EeV54OWa4L177I8QI+tfO0FzDrNcZt76DXwuYOcnateb/w0cXS9LKeZpyjd3tYp QHxJOYkv2D2mdU8vmBAm1qYLJeb0QsuLCUGlOwjVe43JlqFubAalLF4z/xHQpgClmE+58Y Z8MyOJOo5/DaHNauXIZW1JdUsx/02Z4= Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-1bdbbede5d4so40806835ad.2 for ; Wed, 16 Aug 2023 01:11:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692173482; x=1692778282; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:from:to:cc:subject:date:message-id :reply-to; bh=2Ge3/AeXRJddBHQ7s/xkI1nEQCp8JZ9ysjNq/9S8Le0=; b=rGrE2CTdAqffTWT7UiWLA1bVQj5d9anoYhuJ6uuuMiXgn9U6RYXjllhW6+/0U9S8B8 xP/qxVbrqCXFFG+4d/tAghXjdjY2luwenFdw6ZTgGPEOoo26O6oYhBOsh/YwhLsEDBvv /x/7ou1nbD0iEDQXSMHx1ULeUzNUPI9Vh4UoRqWT51+/84UYm9s+6Agh2hmebm1XTSqE DLWmMZo0DmiJNFqDzFxS3jr3/6X6B4ztI3QV5uFdSI5bLYtYL8FQVmoJ5BsPezo6ZNx9 CT9qDJoFAMO6hQQuJRtLyqxBE7W8tlwXqOFGKLsprBUx1Cesp03EOIbRsgIxBW/bm4RC m8Ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692173482; x=1692778282; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2Ge3/AeXRJddBHQ7s/xkI1nEQCp8JZ9ysjNq/9S8Le0=; b=BHVkSgMJA/XPSUTWf5cjfKo8Fdpm4KjiwzR3seunBWXX42lsRSDfk4UEZpwf861CFv GLCO+yCRnvwr2bR0yJ7ldYuVMbXawxRyvDomhThQBJNuEYWzj1pv9kQ3gvfPYtXJbToH /198A57j/n6RNULLMv5CbnTeqvI4RgV2365qLoatLfrmIkZfaH/FgGxut4nYZc4oMiIE rkHP1xvBwQ9e9KuWIIaGSnwgEJqeJ/E1oV1XYG8wL8kM/sa145VdxpZViwWznXG2RrOC vRE0B8WMnqNu6Jb31MvE+nJqp557XlYwqshVWWQWh79OcB64oQwJZlhJUFVSrS2OwPmq ILAg== X-Gm-Message-State: AOJu0YyM9gys+GJeC0Xeem/O6svAEV/RFPWBUEjZV4E8HuAl4AQPT1gk XeOxtoi+FAMt4LoTVeE3ijs= X-Google-Smtp-Source: AGHT+IFFYdVka9Dkv350sy79zA/Bl8g1g1zvFfmUwpi0P/KvaZDN/Td0U9NBmWbOCKs6ZyvUVfRlVg== X-Received: by 2002:a17:903:26d6:b0:1bb:e71f:793c with SMTP id jg22-20020a17090326d600b001bbe71f793cmr1000878plb.44.1692173481612; Wed, 16 Aug 2023 01:11:21 -0700 (PDT) Received: from smtpclient.apple (pdcd3f62f.hyognt01.ap.so-net.ne.jp. [220.211.246.47]) by smtp.gmail.com with ESMTPSA id x2-20020a170902ec8200b001bba669a7eesm12503943plg.52.2023.08.16.01.11.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 16 Aug 2023 01:11:21 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Itaru Kitayama Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v5 0/5] variable-order, large folios for anonymous memory Date: Wed, 16 Aug 2023 17:11:09 +0900 Message-Id: References: <20230810142942.3169679-1-ryan.roberts@arm.com> Cc: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , "Kirill A. Shutemov" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org In-Reply-To: <20230810142942.3169679-1-ryan.roberts@arm.com> To: Ryan Roberts X-Mailer: iPhone Mail (20G75) X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 099034000A X-Stat-Signature: mp6c5we8qunubxxoyzek4wuywrda73pk X-Rspam-User: X-HE-Tag: 1692173482-672948 X-HE-Meta: U2FsdGVkX1/GtmK5mThqKBjK3hEyDXFhUJ9hgp1MwJh9vnr9e2qcSJj31f6QGC9hCsMj8b1MFlj7A8Fl/B22DBnKZs1aS4jftI0d7dLJrBoFUSBDfzlpXjQZVmoa7odU7SLFI/72TGelh8soWDtJayMxKlL66m8VVuuFPjDn+N1AJFTGvXQ1+Fq1P3I4QqxiiEQQMAn6AWTSJMywIjK/qx4kRi8pt52O/PTfmy5VeOHFFp5FGns5TtqPavrIGqfG8uYpN3EKhFKPQsH5WvPOxT4Mi0+zY7LbWSPkAfgA/wkAS3rw2rsTnLe4ySlCnmtnUCyWU2KBO0UhhLkjww/sNP66Sr5A4oNnnz1dn+I8Y3nXx1ALL0r2LMgSJDBHbUPDgfAOhT/ZxFtj/XBS0l9ILRj/npgoAVrW5tiEDPfquJaN7ot0uhP3fJ3V1Qpz8UQI4THRu5goLxTQBJNPtZLMfKn9PkJ/0PG35Rd31DpzQUdrU/WjUHUoSv3m/kqLJXjIgxzuT/+VAclpqgDz0XgSP57U9TjqzLUCUeHPB9mfL/0lwe3/kQA0DS+ReGlVALmfVZEJybEMzKbRznZOzX9l7FJ/FClZGdn6rzrM/4wvdloQTFbd0wXWDCmwifb0e/QpcD55VEQigQOJi57ZcYQO02zTLwG1RQkzQKFHnsyRkvQDLe6MH3FLTAl1IvlN3AHZHZWbxZTPCM+mgu0CVngAqgWcZPzgpNw0QSYRJBqQInX/s3CKLrnDiGyB5HfgMg1RMMBZirS/6gHNE5jEMv1pBPmdzRgXsUZOvcDMuslw7z4ALPrkDwDPt2+nTiYiP5qRJKB3Yw7MN2LzFwLDKlCAWAi7g/0KTe8KdwuP5x6VqsRCKYIvBv5++TUbYo4Ljwej2CTs5ye1nmd9bX61xlkl7V3SlXD86BQi/t04m+7UsnvhsjdkhGchnY9tChGoz3TnwNUK5cGh+FugsNcJSLp hUNOJctD nd/repgURho6XDJhMKbIw8SrYMgwh2gaW5W2x24NNRleMpcIOCvCDkLWNhOnGh+n+koXqnStMCE3fh8UGJuKmgVpDjEMTsrXhvVV1R8jmVl7hxgM9/x2b9G5lUaylfspFWN6EETxikon1K++1Kn0lhzrW0weLh/gCBBTVYINbjkzJy2a5TvlX3fzBfSeiNDidkngWp/y/UpYyX6Um74zy+DPd+IegsFTCGhGvrExhoQ0xwYj4zYrSuAupeu/kFXj/VX2RZzreR4gQxDoDjNTdb24QGkYn22PeYZbA9T+UhquUd8ml1MYVN71XYzi4H9JiJQ0vBsIly4S4RmrwOQjXFfx5r3WJipL5iP9eLqvwIIu5FZxNZejJXXhhyjHXhLeq2LZdmw4ExA3VUFUgQA9uJfb2bs7nViIZfFdNhjymQ5aG7CVn2Z8NkAWtIj3AxzhUmR9NC6olxrf7eiLhKOYO7DsfxgBizhpKmwvcsfXN2FIS8aid9DsxuvZRxlU6Kg9+uu2Y47aTsKy3H5DzVQ7QYjd7Xeztfa/V/9g/n2EKsh+ehtjguhP/JIcgNhlUfxK8CFrU X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Aug 10, 2023, at 23:29, Ryan Roberts wrote: >=20 > =EF=BB=BFHi All, >=20 > This is v5 of a series to implement variable order, large folios for anony= mous > memory. (currently called "LARGE_ANON_FOLIO", previously called "FLEXIBLE_= THP"). > The objective of this is to improve performance by allocating larger chunk= s of > memory during anonymous page faults: >=20 > 1) Since SW (the kernel) is dealing with larger chunks of memory than base= > pages, there are efficiency savings to be had; fewer page faults, batche= d PTE > and RMAP manipulation, reduced lru list, etc. In short, we reduce kernel= > overhead. This should benefit all architectures. > 2) Since we are now mapping physically contiguous chunks of memory, we can= take > advantage of HW TLB compression techniques. A reduction in TLB pressure > speeds up kernel and user space. arm64 systems have 2 mechanisms to coal= esce > TLB entries; "the contiguous bit" (architectural) and HPA (uarch). >=20 > This patch set deals with the SW side of things (1). (2) is being tackled i= n a > separate series. The new behaviour is hidden behind a new Kconfig switch, > LARGE_ANON_FOLIO, which is disabled by default. Although the eventual aim i= s to > enable it by default. >=20 > My hope is that we are pretty much there with the changes at this point; > hopefully this is sufficient to get an initial version merged so that we c= an > scale up characterization efforts. Although they should not be merged unti= l the > prerequisites are complete. These are in progress and tracked at [5]. >=20 > This series is based on mm-unstable (ad3232df3e41). >=20 > I'm going to be out on holiday from the end of today, returning on 29th > August. So responses will likely be patchy, as I'm terrified of posting > to list from my phone! >=20 >=20 > Testing > ------- >=20 > This version adds patches to mm selftests so that the cow tests explicitly= test > large anon folios, in the same way that thp is tested. When enabled you sh= ould > see something similar at the start of the test suite: >=20 > # [INFO] detected large anon folio size: 32 KiB >=20 > Then the following results are expected. The fails and skips are due to ex= isting > issues in mm-unstable: >=20 > # Totals: pass:207 fail:16 xfail:0 xpass:0 skip:85 error:0 >=20 > Existing mm selftests reveal 1 regression in khugepaged tests when > LARGE_ANON_FOLIO is enabled: >=20 > Run test: collapse_max_ptes_none (khugepaged:anon) > Maybe collapse with max_ptes_none exceeded.... Fail > Unexpected huge page >=20 > I believe this is because khugepaged currently skips non-order-0 pages whe= n > looking for collapse opportunities and should get fixed with the help of > DavidH's work to create a mechanism to precisely determine shared vs exclu= sive > pages. >=20 >=20 > Changes since v4 [4] > -------------------- >=20 > - Removed "arm64: mm: Override arch_wants_pte_order()" patch; arm64 > now uses the default order-3 size. I have moved this patch over to > the contpte series. > - Added "mm: Allow deferred splitting of arbitrary large anon folios" bac= k > into series. I originally removed this at v2 to add to a separate serie= s, > but that series has transformed significantly and it no longer fits, so= > bringing it back here. > - Reintroduced dependency on set_ptes(); Originally dropped this at v2, b= ut > set_ptes() is in mm-unstable now. > - Updated policy for when to allocate LAF; only fallback to order-0 if > MADV_NOHUGEPAGE is present or if THP disabled via prctl; no longer rely= on > sysfs's never/madvise/always knob. > - Fallback to order-0 whenever uffd is armed for the vma, not just when > uffd-wp is set on the pte. > - alloc_anon_folio() now returns `strucxt folio *`, where errors are enco= ded > with ERR_PTR(). >=20 > The last 3 changes were proposed by Yu Zhao - thanks! >=20 >=20 > Changes since v3 [3] > -------------------- >=20 > - Renamed feature from FLEXIBLE_THP to LARGE_ANON_FOLIO. > - Removed `flexthp_unhinted_max` boot parameter. Discussion concluded tha= t a > sysctl is preferable but we will wait until real workload needs it. > - Fixed uninitialized `addr` on read fault path in do_anonymous_page(). > - Added mm selftests for large anon folios in cow test suite. >=20 >=20 > Changes since v2 [2] > -------------------- >=20 > - Dropped commit "Allow deferred splitting of arbitrary large anon folios= " > - Huang, Ying suggested the "batch zap" work (which I dropped from th= is > series after v1) is a prerequisite for merging FLXEIBLE_THP, so I'v= e > moved the deferred split patch to a separate series along with the b= atch > zap changes. I plan to submit this series early next week. > - Changed folio order fallback policy > - We no longer iterate from preferred to 0 looking for acceptable pol= icy > - Instead we iterate through preferred, PAGE_ALLOC_COSTLY_ORDER and 0= only > - Removed vma parameter from arch_wants_pte_order() > - Added command line parameter `flexthp_unhinted_max` > - clamps preferred order when vma hasn't explicitly opted-in to THP > - Never allocate large folio for MADV_NOHUGEPAGE vma (or when THP is disa= bled > for process or system). > - Simplified implementation and integration with do_anonymous_page() > - Removed dependency on set_ptes() >=20 >=20 > Changes since v1 [1] > -------------------- >=20 > - removed changes to arch-dependent vma_alloc_zeroed_movable_folio() > - replaced with arch-independent alloc_anon_folio() > - follows THP allocation approach > - no longer retry with intermediate orders if allocation fails > - fallback directly to order-0 > - remove folio_add_new_anon_rmap_range() patch > - instead add its new functionality to folio_add_new_anon_rmap() > - remove batch-zap pte mappings optimization patch > - remove enabler folio_remove_rmap_range() patch too > - These offer real perf improvement so will submit separately > - simplify Kconfig > - single FLEXIBLE_THP option, which is independent of arch > - depends on TRANSPARENT_HUGEPAGE > - when enabled default to max anon folio size of 64K unless arch > explicitly overrides > - simplify changes to do_anonymous_page(): > - no more retry loop >=20 >=20 > [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-1-ryan.roberts= @arm.com/ > [2] https://lore.kernel.org/linux-mm/20230703135330.1865927-1-ryan.roberts= @arm.com/ > [3] https://lore.kernel.org/linux-mm/20230714160407.4142030-1-ryan.roberts= @arm.com/ > [4] https://lore.kernel.org/linux-mm/20230726095146.2826796-1-ryan.roberts= @arm.com/ > [5] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@= arm.com/ >=20 >=20 > Thanks, > Ryan >=20 > Ryan Roberts (5): > mm: Allow deferred splitting of arbitrary large anon folios > mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() > mm: LARGE_ANON_FOLIO for improved performance > selftests/mm/cow: Generalize do_run_with_thp() helper > selftests/mm/cow: Add large anon folio tests >=20 > include/linux/pgtable.h | 13 ++ > mm/Kconfig | 10 ++ > mm/memory.c | 144 +++++++++++++++++-- > mm/rmap.c | 31 +++-- > tools/testing/selftests/mm/cow.c | 229 ++++++++++++++++++++++--------- > 5 files changed, 347 insertions(+), 80 deletions(-) >=20 > -- > 2.25.1 >=20 I know Ryan is away currently, but as I can=E2=80=99t find the base commit m= entioned in the cover letter to be based off of can anybody point me to it s= o I can use b4 for applying the series and test? Thanks, Itaru.=