From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34FA9C001E0 for ; Wed, 16 Aug 2023 11:57:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A86E3280014; Wed, 16 Aug 2023 07:57:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A35F08D0001; Wed, 16 Aug 2023 07:57:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D618280014; Wed, 16 Aug 2023 07:57:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7E41D8D0001 for ; Wed, 16 Aug 2023 07:57:25 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 326F3160401 for ; Wed, 16 Aug 2023 11:57:25 +0000 (UTC) X-FDA: 81129817650.18.0388A7D Received: from mail-oi1-f177.google.com (mail-oi1-f177.google.com [209.85.167.177]) by imf04.hostedemail.com (Postfix) with ESMTP id 3CAB040022 for ; Wed, 16 Aug 2023 11:57:22 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=HRXW6crY; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of itaru.kitayama@gmail.com designates 209.85.167.177 as permitted sender) smtp.mailfrom=itaru.kitayama@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692187043; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ECJlIO/6X+Pq0Ru6+xgHtKmmcWycq6LlbP0Bph224Yc=; b=m/D24pRY4urPa7PELeA136z9kTcAlc5hJiC91nlr9iu94qjY+iKdEZWga2xl1Qf9NCKv+n 3SD5X5UpkwpI9KD47d9PX0UZ01wrwLWl+h4q+ve2NirWMDzLUAYq0JYR4KDszuNO1dD+wl 45hwLnB8s1iHYEVzrXWZXAcmNEFWhoI= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=HRXW6crY; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of itaru.kitayama@gmail.com designates 209.85.167.177 as permitted sender) smtp.mailfrom=itaru.kitayama@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692187043; a=rsa-sha256; cv=none; b=L6n9ftBgAXblG5GlQg0aa+dg+kHZC33s+hgyfTngF7urUQkrexgviVE9/fESWZTmuxA7rl m+WA+zYiQkuP3V4DChiC11lxPBjcMOXaQUxnRo9I3/Yu6xAaD5YntLkXeuoYs6UKe82thC pVNQPRS9IwFpvOjxqoRwjF+4Bvqxa3U= Received: by mail-oi1-f177.google.com with SMTP id 5614622812f47-3a800814122so2371593b6e.0 for ; Wed, 16 Aug 2023 04:57:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692187042; x=1692791842; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:from:to:cc:subject:date:message-id :reply-to; bh=ECJlIO/6X+Pq0Ru6+xgHtKmmcWycq6LlbP0Bph224Yc=; b=HRXW6crYAaHqDI/FSqlzb6aeEfhOIDWtR3UaQ6MS6PvGYo7kMwhLwHvWIthhBJW419 6LUzHz5SBWXFMm0Qfw60AyCQKYz+33uiXTgYSJr9GZLP+5Qh7RYoG4rVBf1aSyc3o2vX 7LvbLZW1NOJJVF+qzfLEKayHf6DpU8UqGJhv52/kr5yyX5iYu1Yu14WC8Qyqfi7Q216A PDvQXJbZazx9yTFBRx83MW7k1VMmsK2riBsyFxEeTggO5q1bAOH2e5qnuN6yR/Y6xGge wtVfw299rmTJZpN4ObOTofqF9NZQvHJKNb5n7rOCg3XKKyNs4YoXvrdR1Fdupe8Yi2F0 vwHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692187042; x=1692791842; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ECJlIO/6X+Pq0Ru6+xgHtKmmcWycq6LlbP0Bph224Yc=; b=E0bFblBNxrle7Q99x2XYUvfmPSiNpxahOqrqr8PQ6Br1y0B0tjt/SsFlaNP+crBdOi CLN5uL+MTJUWEPb1dkUIcKTDb0XFTkveujgrWzkwBQhL6sMwh7Y6mhWBha+gBn3xolxu 6/O+Q7w2oDK0acC30YADwFBC6C4uOzi4eHkyqskzGxd94DeB6CyTqTvnCUvS3cxHUwdw np7iFL+6BMV97JRBaWljsbxLqgk8B63MR17a/V2h9ILAUBc/Gy7GbdRRE5ghkz6YBoZO GqKAuM9toTkjuJEIrgYFX4jd3pYIMS0GkIqglHuKYFCnKtc+T/e4uQCqTJOOZPqNkKbr DlEg== X-Gm-Message-State: AOJu0YwOqifioYehCEGiUGeCQYkwqPIm8Sv1KFGe5i0a5ONY6jaYuIb9 hwe9EnHrP+T9THVrByKzSZE= X-Google-Smtp-Source: AGHT+IHwUyDdcJJDkGVFilyiGwroMhv1qT0g8fhl8vzPGP6sKSz4afgQH32TWPK0jraGCswdEmuOJw== X-Received: by 2002:aca:220c:0:b0:3a7:8e2a:6173 with SMTP id b12-20020aca220c000000b003a78e2a6173mr1908248oic.2.1692187042017; Wed, 16 Aug 2023 04:57:22 -0700 (PDT) Received: from smtpclient.apple (pdcd3f62f.hyognt01.ap.so-net.ne.jp. [220.211.246.47]) by smtp.gmail.com with ESMTPSA id n26-20020a638f1a000000b00565009a97f0sm11662252pgd.17.2023.08.16.04.57.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 16 Aug 2023 04:57:21 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Itaru Kitayama Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v5 0/5] variable-order, large folios for anonymous memory Date: Wed, 16 Aug 2023 20:57:09 +0900 Message-Id: <6DEE431A-1089-4DB4-8BE3-152569576E53@gmail.com> References: <3a0ada31-0ec5-4a7a-ab9d-d59c3684b662@intel.com> Cc: Ryan Roberts , Andrew Morton , Matthew Wilcox , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , "Kirill A. Shutemov" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org In-Reply-To: <3a0ada31-0ec5-4a7a-ab9d-d59c3684b662@intel.com> To: "Yin, Fengwei" X-Mailer: iPhone Mail (20G75) X-Rspamd-Queue-Id: 3CAB040022 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: mpiybwohhhjiz946m9pkeiuxi4ui7yy7 X-HE-Tag: 1692187042-982098 X-HE-Meta: U2FsdGVkX18U3zC9iBZwg7g86AMaTvBYHxET3sgFcFtNXYm4tPWWkTJbUu1aTiBd0B69YP7s7HXq9taksmXzIgpVZKTVI5ToCBYuw+yrKohW3N6kSaIRUgW6czTe9JodzdN/mPJDmu+qvV3yGp1/TLxvDAgvmEB0t9I0FingBL4u+3kavyYAbQc6g3HNGeSahEAAEh3eZmdNRl+dwkj6yG9DbAUB3Pzp6a5qEvedhjCZnetf4MqdL1wQO0Oe7F0CU6uQcGliBTOEpWu4RPbW91dBx74Ftr7Q7nUbyzp9fF7KO8P/wDY1ZnM1mW9R0S3wzI5saM3wCmQBPEui3clrI1P5R2ZAqOCjuJXxlWIfCqZgsi3Z14wXNLc50CRYltALMZDy+snINXHyo/rJOnwmtyiPl41dErHyAR+xtLQdp9RodcnIEDo7fF4lKigieT3qHVEUMzr2jx6zLlym2jFcFlQ9AXQzm6Da1VV8HbtdMZ1OVOC6l63SU/bHujcjToBHNz9yjLvraIsEaJSOrvZrBn87Er/WlL9Qix+eGxXngzkVSfNqLo2ZI/HcLEbe851wC1BvAb0k4e2OVvVzhlQhiWcQBccT4il8R9BI7shjHy192X2ADXXUWM28vDPZd072RY7Km0KOO9xntDu5JR0Ow6s9UtNbLP2/PWmaezRRjG/5ID63uqpYrVaatvW7U2Uhi2D2VWPo2/6GL2QMYEgzb4RN2BHTrYgrfNfRwiPxfWRhAKgiHufbx48c54VRAMpi17PI6LThvRMy6E1mmTCm5tw8STgCVh/R0L0R220YeBje1cWglFls4szHDqarlxWBjApGDcX9o/xyfLAMeSNaZTeB5+L+WMVWuQZawB5B5O801NFxleMSM4fC9Sbglq9UIRe+a9/VqSQ/jiGeRC3capNc/X8hKOASGxUJbKqM51nDm0A0kDrrFBVRXXLgWOeWkfTamyigqIKy3amVRKo GQVJxEAR 8WjUIW5Zqjd8BToTj5X9KrFNRPaXkT8I/vE7PzVvzHAKN/GcWLkFFhjimY3RPfEnA4Nv0N04wIGB6R3NbGmvEvk5O6LpV7TQLhPiHy0tVHa0sOHMTexnMbZi7yEGXpRpAAZT8mCt873AIWE4NVA6pb5+64l2yxGYGezm37dWjsT6n2+3ELjl7GGw4D9qwVuF9Kd0qWpjr7Ysrsr2CFwXqty/2oGpqL5SjPvaOIr0vYnU2vCGew/C6BgJNjC62q1Xgw9VnvbW5s/QjgEL+zybPhFs11TOh9s45eK51FRxvh2qs5QVFcxMBviMWfC0vOTUsHpLid8xf/xblNUTzRFuCg88hxKtpUeylSOUNILQi5zMdem+Q+PsUeKJkzpFdzS5j5WRtvKw2B+9UpMmM069Q8US5dD+73jku0NHCP+7P7eh6Ii3ndA4ZKMzbOspsah3UDNpefy2Vkm5H8hMvlQHaZD0YoZYwOkH13bfpf4UtqeR72r3h/aNrFYd3nz/lVxzetR6pM3++In6x1UoooLBDEEO1LqmboRZDnVm0yU4L0dhIYDfmR0xSZeSQ1n58KIyavXFiCEd+mNihU520jzu9q0L/Gw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Aug 16, 2023, at 18:25, Yin, Fengwei wrote: >=20 > =EF=BB=BF >=20 >> On 8/16/2023 4:11 PM, Itaru Kitayama wrote: >>=20 >>=20 >>>> On Aug 10, 2023, at 23:29, Ryan Roberts wrote: >>>=20 >>> =EF=BB=BFHi All, >>>=20 >>> This is v5 of a series to implement variable order, large folios for ano= nymous >>> memory. (currently called "LARGE_ANON_FOLIO", previously called "FLEXIBL= E_THP"). >>> The objective of this is to improve performance by allocating larger chu= nks of >>> memory during anonymous page faults: >>>=20 >>> 1) Since SW (the kernel) is dealing with larger chunks of memory than ba= se >>> pages, there are efficiency savings to be had; fewer page faults, batch= ed PTE >>> and RMAP manipulation, reduced lru list, etc. In short, we reduce kerne= l >>> overhead. This should benefit all architectures. >>> 2) Since we are now mapping physically contiguous chunks of memory, we c= an take >>> advantage of HW TLB compression techniques. A reduction in TLB pressure= >>> speeds up kernel and user space. arm64 systems have 2 mechanisms to coa= lesce >>> TLB entries; "the contiguous bit" (architectural) and HPA (uarch). >>>=20 >>> This patch set deals with the SW side of things (1). (2) is being tackle= d in a >>> separate series. The new behaviour is hidden behind a new Kconfig switch= , >>> LARGE_ANON_FOLIO, which is disabled by default. Although the eventual ai= m is to >>> enable it by default. >>>=20 >>> My hope is that we are pretty much there with the changes at this point;= >>> hopefully this is sufficient to get an initial version merged so that we= can >>> scale up characterization efforts. Although they should not be merged un= til the >>> prerequisites are complete. These are in progress and tracked at [5]. >>>=20 >>> This series is based on mm-unstable (ad3232df3e41). >>>=20 >>> I'm going to be out on holiday from the end of today, returning on 29th >>> August. So responses will likely be patchy, as I'm terrified of posting >>> to list from my phone! >>>=20 >>>=20 >>> Testing >>> ------- >>>=20 >>> This version adds patches to mm selftests so that the cow tests explicit= ly test >>> large anon folios, in the same way that thp is tested. When enabled you s= hould >>> see something similar at the start of the test suite: >>>=20 >>> # [INFO] detected large anon folio size: 32 KiB >>>=20 >>> Then the following results are expected. The fails and skips are due to e= xisting >>> issues in mm-unstable: >>>=20 >>> # Totals: pass:207 fail:16 xfail:0 xpass:0 skip:85 error:0 >>>=20 >>> Existing mm selftests reveal 1 regression in khugepaged tests when >>> LARGE_ANON_FOLIO is enabled: >>>=20 >>> Run test: collapse_max_ptes_none (khugepaged:anon) >>> Maybe collapse with max_ptes_none exceeded.... Fail >>> Unexpected huge page >>>=20 >>> I believe this is because khugepaged currently skips non-order-0 pages w= hen >>> looking for collapse opportunities and should get fixed with the help of= >>> DavidH's work to create a mechanism to precisely determine shared vs exc= lusive >>> pages. >>>=20 >>>=20 >>> Changes since v4 [4] >>> -------------------- >>>=20 >>> - Removed "arm64: mm: Override arch_wants_pte_order()" patch; arm64 >>> now uses the default order-3 size. I have moved this patch over to >>> the contpte series. >>> - Added "mm: Allow deferred splitting of arbitrary large anon folios" ba= ck >>> into series. I originally removed this at v2 to add to a separate seri= es, >>> but that series has transformed significantly and it no longer fits, s= o >>> bringing it back here. >>> - Reintroduced dependency on set_ptes(); Originally dropped this at v2, b= ut >>> set_ptes() is in mm-unstable now. >>> - Updated policy for when to allocate LAF; only fallback to order-0 if >>> MADV_NOHUGEPAGE is present or if THP disabled via prctl; no longer rel= y on >>> sysfs's never/madvise/always knob. >>> - Fallback to order-0 whenever uffd is armed for the vma, not just when >>> uffd-wp is set on the pte. >>> - alloc_anon_folio() now returns `strucxt folio *`, where errors are enc= oded >>> with ERR_PTR(). >>>=20 >>> The last 3 changes were proposed by Yu Zhao - thanks! >>>=20 >>>=20 >>> Changes since v3 [3] >>> -------------------- >>>=20 >>> - Renamed feature from FLEXIBLE_THP to LARGE_ANON_FOLIO. >>> - Removed `flexthp_unhinted_max` boot parameter. Discussion concluded th= at a >>> sysctl is preferable but we will wait until real workload needs it. >>> - Fixed uninitialized `addr` on read fault path in do_anonymous_page(). >>> - Added mm selftests for large anon folios in cow test suite. >>>=20 >>>=20 >>> Changes since v2 [2] >>> -------------------- >>>=20 >>> - Dropped commit "Allow deferred splitting of arbitrary large anon folio= s" >>> - Huang, Ying suggested the "batch zap" work (which I dropped from t= his >>> series after v1) is a prerequisite for merging FLXEIBLE_THP, so I'= ve >>> moved the deferred split patch to a separate series along with the= batch >>> zap changes. I plan to submit this series early next week. >>> - Changed folio order fallback policy >>> - We no longer iterate from preferred to 0 looking for acceptable po= licy >>> - Instead we iterate through preferred, PAGE_ALLOC_COSTLY_ORDER and 0= only >>> - Removed vma parameter from arch_wants_pte_order() >>> - Added command line parameter `flexthp_unhinted_max` >>> - clamps preferred order when vma hasn't explicitly opted-in to THP >>> - Never allocate large folio for MADV_NOHUGEPAGE vma (or when THP is dis= abled >>> for process or system). >>> - Simplified implementation and integration with do_anonymous_page() >>> - Removed dependency on set_ptes() >>>=20 >>>=20 >>> Changes since v1 [1] >>> -------------------- >>>=20 >>> - removed changes to arch-dependent vma_alloc_zeroed_movable_folio() >>> - replaced with arch-independent alloc_anon_folio() >>> - follows THP allocation approach >>> - no longer retry with intermediate orders if allocation fails >>> - fallback directly to order-0 >>> - remove folio_add_new_anon_rmap_range() patch >>> - instead add its new functionality to folio_add_new_anon_rmap() >>> - remove batch-zap pte mappings optimization patch >>> - remove enabler folio_remove_rmap_range() patch too >>> - These offer real perf improvement so will submit separately >>> - simplify Kconfig >>> - single FLEXIBLE_THP option, which is independent of arch >>> - depends on TRANSPARENT_HUGEPAGE >>> - when enabled default to max anon folio size of 64K unless arch >>> explicitly overrides >>> - simplify changes to do_anonymous_page(): >>> - no more retry loop >>>=20 >>>=20 >>> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-1-ryan.rober= ts@arm.com/ >>> [2] https://lore.kernel.org/linux-mm/20230703135330.1865927-1-ryan.rober= ts@arm.com/ >>> [3] https://lore.kernel.org/linux-mm/20230714160407.4142030-1-ryan.rober= ts@arm.com/ >>> [4] https://lore.kernel.org/linux-mm/20230726095146.2826796-1-ryan.rober= ts@arm.com/ >>> [5] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd7= 3@arm.com/ >>>=20 >>>=20 >>> Thanks, >>> Ryan >>>=20 >>> Ryan Roberts (5): >>> mm: Allow deferred splitting of arbitrary large anon folios >>> mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() >>> mm: LARGE_ANON_FOLIO for improved performance >>> selftests/mm/cow: Generalize do_run_with_thp() helper >>> selftests/mm/cow: Add large anon folio tests >>>=20 >>> include/linux/pgtable.h | 13 ++ >>> mm/Kconfig | 10 ++ >>> mm/memory.c | 144 +++++++++++++++++-- >>> mm/rmap.c | 31 +++-- >>> tools/testing/selftests/mm/cow.c | 229 ++++++++++++++++++++++--------- >>> 5 files changed, 347 insertions(+), 80 deletions(-) >>>=20 >>> -- >>> 2.25.1 >>>=20 >>=20 >> I know Ryan is away currently, but as I can=E2=80=99t find the base commi= t mentioned in the cover letter to be based off of can anybody point me to i= t so I can use b4 for applying the series and test? >>=20 > Ryan mentioned: This series is based on mm-unstable (ad3232df3e41). Couldn=E2=80=99t find the commit in the mm-unstable branch I checked out tod= ay. I=E2=80=99m trying to use Andrew=E2=80=99s mm tree for the first time in= a decade so I=E2=80=99m doing something wrong though. >=20 > I believe you can apply the patchset to latest mm-unstable. Okay. Will try that. Thanks, Itaru. >=20 >=20 > Regards > Yin, Fengwei >=20 >> Thanks, >> Itaru.