From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 329A4CD3431 for ; Wed, 4 Sep 2024 09:16:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8BCB56B00AF; Wed, 4 Sep 2024 05:16:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 86B716B01F2; Wed, 4 Sep 2024 05:16:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E5476B01F6; Wed, 4 Sep 2024 05:16:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4D7586B00AF for ; Wed, 4 Sep 2024 05:16:34 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id BA0B7120E9D for ; Wed, 4 Sep 2024 09:16:33 +0000 (UTC) X-FDA: 82526500266.13.9D7C43B Received: from mail-ua1-f47.google.com (mail-ua1-f47.google.com [209.85.222.47]) by imf28.hostedemail.com (Postfix) with ESMTP id DDCBAC0007 for ; Wed, 4 Sep 2024 09:16:31 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Lk0qlJJp; spf=pass (imf28.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.47 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725441264; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=91sNbecszOwzw0QRWul48uq73nEoQ/YU8yT66MgJfuQ=; b=SES3JyMOse7ZlHC61xOQvgrhPrOmLF1y1IyydA48mIwBj3oRHfQIPRF7ajuKG9qGGw2suO 8+OcBi8dB7/rk51cshJ5ZL/BqVgx0V+S9WtzvD4m0NJC9yGmX7X4B73ABxdZ4Peefy/5hs 7SZMKsQM0fDzvcaSYxvHBTqpAxxwTsE= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Lk0qlJJp; spf=pass (imf28.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.47 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725441264; a=rsa-sha256; cv=none; b=PxLk9YB+yMoLx5C7Df3WjpexFjkGyeiMURtiWPQ9/p2Sw4MQkVnMU0lPceY7GIuEyhyD5x NexlEi0uBzV+X9VVRUPCH2ZFNKPoSYOl7ONWW/Ayutr3OEczt3XqNMrxwRZWk6cgIm62ow DsT2XyiXD6R9CV1B8OBz54AITaaq7/Y= Received: by mail-ua1-f47.google.com with SMTP id a1e0cc1a2514c-846bc2104c8so317062241.0 for ; Wed, 04 Sep 2024 02:16:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725441391; x=1726046191; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=91sNbecszOwzw0QRWul48uq73nEoQ/YU8yT66MgJfuQ=; b=Lk0qlJJpkfWcnb1LZ+8/UY3Tc6uqusAdzEqVvzsL8E1SB7ygbvC2ZZSiwjvyEYRsZS b7HCA5V83oflNyA+/v1b2P6TMflspuBiD+l/yMRHNXlgYq7+pztNj3VtOoNxU9PnXHHn ZkbNCIr1rY1dYdpCBLfKt9T25FKZ0173jG/Q+MwsVK5CuJg3KP+eUjPBm8sP9yB7K7RY JTVVARSAqPt3tQ8ZP9zkmWHq0iedMLrsX0xXl+v2+WIU1/9r4v1NuNrwRRWKYAhs50SH oG32B3w0USLE5f8QAws3MEQnZJ5LUSmYoni6kCWwhlvUSvr6pcn5bC3tJ7EHOHsEZGhB LKaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725441391; x=1726046191; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=91sNbecszOwzw0QRWul48uq73nEoQ/YU8yT66MgJfuQ=; b=AYq5/itPXw5wW0rFgLv8yS6xUY4wdiiMq0yNp0o3ylrEN6usFdgDBaK6EkQrDxxfht rYobxgYnxoyXKabeaZI2KqkZM5KXRRgueeX3eD+jsEbTNvZ2mUEoAD0dNb01qMBSLeoQ PSW8QWAQqTONLDIliYNGx6Y4gTU6eTmZCpMjFBVvmQc4/NMYrdZQmT2+TfcZO0WmvJbI K7kZB9NMa7s6UC22rBzuFvdjKKH+3/vephTwoOglUtcf2+gx4kq7XbUe/Q91lJU9OTE0 4Z8JEmR5PAAYQ4GhDbrYczVwE6t7WK0i8SMIvah8tjr183Hrtg/QgW2S71lr/uXHFCkS WDpw== X-Forwarded-Encrypted: i=1; AJvYcCX/H8Oxwd3/MUIOlG+JUh5FmZYUGc+e9ryBC2yTnLKjWQSe0tt5+pXfx84Yew1XqzjiuH8tZb90LQ==@kvack.org X-Gm-Message-State: AOJu0Yw3+p5PEVPa7EWE5qIe2tOnoEIANbMGqWNqVH0nBGU0hWwzs5w9 OtaDNeLzya+CV2ji9scmaVJuv44SBkIKmjdsAX/M1fiBR6LbBmT6IU1cQJt+gHeAhUVP3e/Yxrz Ufjj54K03xOuPuxNI0hTXIYjltoWxwGI7mj4QWg== X-Google-Smtp-Source: AGHT+IFNCFzpgp7HilV6aNl+E7w7ZoHrpVIaxChMt0OEAh+EnavZjow925kCbCCaoGTGHpIOZRazcuO1joORWaiaH7I= X-Received: by 2002:a05:6102:f06:b0:492:84ac:2eb7 with SMTP id ada2fe7eead31-49bba729b41mr435608137.11.1725441390653; Wed, 04 Sep 2024 02:16:30 -0700 (PDT) MIME-Version: 1.0 References: <20240805153639.1057-1-justinjiang@vivo.com> <20240805153639.1057-3-justinjiang@vivo.com> In-Reply-To: <20240805153639.1057-3-justinjiang@vivo.com> From: Barry Song <21cnbao@gmail.com> Date: Wed, 4 Sep 2024 21:16:19 +1200 Message-ID: Subject: Re: [PATCH v3 2/2] mm: tlb: add tlb swap entries batch async release To: Zhiguo Jiang Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Will Deacon , "Aneesh Kumar K.V" , Nick Piggin , Peter Zijlstra , Arnd Bergmann , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , linux-arch@vger.kernel.org, cgroups@vger.kernel.org, David Hildenbrand , opensource.kernel@vivo.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: DDCBAC0007 X-Stat-Signature: if7mnjp4a1hu9xny8o3y35bpnwp6tg5h X-Rspam-User: X-HE-Tag: 1725441391-997433 X-HE-Meta: U2FsdGVkX18nhCLwhzYh5L9+Q490vtmaPWiW6lutZiBmuNPYYiZofcWKGY3x0zIJvnmTy+eOHq2PlwEHhMk24zzHVNlY6EDEuqbQVHZqfj0Qhh9boVokPxfk3xHa66NK1Qhpq909u9FqiCHhZ/wW7W/03oNR6JEEvzfXg5pj6Wnk3z0SqcegHLaBkWEkM33uKoxlyp+SkSL1UENDkiQgxFgxhBRG1aNWjWxt7JlI/7atcrMtk64L4kdr3i0qoNUWzbz+miWgkbV3CH6AaUqSG/iUdAa933WFvKwLoptWHF2q9+Hrkd1aSROMAX0lVP3h8icxeb1Xdn6oM5VYwm18ZpfA2vPnA6QaEESV925ZRMg2pVg0+SZaQjjfubeNlPBRqpddPyk8WZXqVyodXfODuSQUDNuw0qHSkUUyyuHQMrTyggw8peC6DCc1CfzY5u3Psc3aFYdLDf2vDtw9qDir1Lja0S4JNuUFRzAzn+4SQp3o2vmxX6dZ/byDdgp3clpuklHWv7HEbugKNHplhHUy0Z5Ik18m16zo+f4SAXd7MxaJkgHHwjWhMSie5OWu1fK1uM8MRVZ9QobOSJaNM784MNY/tQ0YjKfSrT3lCqx/PJ51gzM4iz287hvox1EMIlcSkB0OGRFB2IfsuWRjZbc6BWiZKfcSd8E3u6pM31OLSM5JUWivguEuKATSz3PefzdrRlPKk9b7etXOaRzIL2jErTIhOvSxVmMqTiUQfTCOk9VthH+FpESwfTanOIBpZWAkG6lJyj4qtTSwPAKoA49iyIbTq6bVCFOi32LFrWED3NjknoQeahwbwKiin+HXwKcGl6mLVifNqn/fEMEYaaWCDCoYvl6mhG5jXmdwpZop/2FxPmUIFLdOVU9VIsU5DK6uyG+pM1xziLgv2JwDlN3/ccfp7xrAfP8fryOYxGUutPolPzaiP96LHQFf+lg62yPF1ZR9LECKQPKdSZyCkkY cI1TBBND y97SfuByyutvx716gHSU8QHGEiyBAvWXHcPnfMQPsAWKJSBJCpujpJz6JRhCQ93LhxS1umLlp03ZOdWrTv5/BJI58+aCbNPS5JXK7VLEcakT1AMIf4tixINODLWP7akWTLPv3AFu/IQSp36ugGvBPq122/R8ScvAeuRcoPSf2+h11GBdj8UIlRT0UI017lDqUIRfs2eduH2cDO+CkGeh6a7ZMv2IQM9u3PiC8OixKGTsEQPvZfgaw1RgDX+bpKE/8hxbNFJyR2gdCP/xS5r/hRQCQvdSUexQvV5whLSdpCImF0FIkLEsVpxr6hPhpCRbFNN/5R7kuPUXVpZ6nLLPfgTgcdXw1iSy09Z8fMZyoHZ3tWRBjS+o00WZvAg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 6, 2024 at 3:36=E2=80=AFAM Zhiguo Jiang = wrote: > > One of the main reasons for the prolonged exit of the process with > independent mm is the time-consuming release of its swap entries. > The proportion of swap memory occupied by the process increases over > time due to high memory pressure triggering to reclaim anonymous folio > into swapspace, e.g., in Android devices, we found this proportion can > reach 60% or more after a period of time. Additionally, the relatively > lengthy path for releasing swap entries further contributes to the > longer time required to release swap entries. > > Testing Platform: 8GB RAM > Testing procedure: > After booting up, start 15 processes first, and then observe the > physical memory size occupied by the last launched process at different > time points. > Example: The process launched last: com.qiyi.video > | memory type | 0min | 1min | 5min | 10min | 15min | > ------------------------------------------------------------------- > | VmRSS(KB) | 453832 | 252300 | 204364 | 199944 | 199748 | > | RssAnon(KB) | 247348 | 99296 | 71268 | 67808 | 67660 | > | RssFile(KB) | 205536 | 152020 | 132144 | 131184 | 131136 | > | RssShmem(KB) | 1048 | 984 | 952 | 952 | 952 | > | VmSwap(KB) | 202692 | 334852 | 362880 | 366340 | 366488 | > | Swap ratio(%) | 30.87% | 57.03% | 63.97% | 64.69% | 64.72% | > Note: min - minute. > > When there are multiple processes with independent mm and the high > memory pressure in system, if the large memory required process is > launched at this time, system will is likely to trigger the instantaneous > killing of many processes with independent mm. Due to multiple exiting > processes occupying multiple CPU core resources for concurrent execution, > leading to some issues such as the current non-exiting and important > processes lagging. > > To solve this problem, we have introduced the multiple exiting process > asynchronous swap entries release mechanism, which isolates and caches > swap entries occupied by multiple exiting processes, and hands them over > to an asynchronous kworker to complete the release. This allows the > exiting processes to complete quickly and release CPU resources. We have > validated this modification on the Android products and achieved the > expected benefits. > > Testing Platform: 8GB RAM > Testing procedure: > After restarting the machine, start 15 app processes first, and then > start the camera app processes, we monitor the cold start and preview > time datas of the camera app processes. > > Test datas of camera processes cold start time (unit: millisecond): > | seq | 1 | 2 | 3 | 4 | 5 | 6 | average | > | before | 1498 | 1476 | 1741 | 1337 | 1367 | 1655 | 1512 | > | after | 1396 | 1107 | 1136 | 1178 | 1071 | 1339 | 1204 | > > Test datas of camera processes preview time (unit: millisecond): > | seq | 1 | 2 | 3 | 4 | 5 | 6 | average | > | before | 267 | 402 | 504 | 513 | 161 | 265 | 352 | > | after | 188 | 223 | 301 | 203 | 162 | 154 | 205 | > > Base on the average of the six sets of test datas above, we can see that > the benefit datas of the modified patch: > 1. The cold start time of camera app processes has reduced by about 20%. > 2. The preview time of camera app processes has reduced by about 42%. > > It offers several benefits: > 1. Alleviate the high system cpu loading caused by multiple exiting > processes running simultaneously. > 2. Reduce lock competition in swap entry free path by an asynchronous > kworker instead of multiple exiting processes parallel execution. > 3. Release pte_present memory occupied by exiting processes more > efficiently. > > Signed-off-by: Zhiguo Jiang > --- > arch/s390/include/asm/tlb.h | 8 + > include/asm-generic/tlb.h | 44 ++++++ > include/linux/mm_types.h | 58 +++++++ > mm/memory.c | 3 +- > mm/mmu_gather.c | 296 ++++++++++++++++++++++++++++++++++++ > 5 files changed, 408 insertions(+), 1 deletion(-) > > diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h > index e95b2c8081eb..3f681f63390f > --- a/arch/s390/include/asm/tlb.h > +++ b/arch/s390/include/asm/tlb.h > @@ -28,6 +28,8 @@ static inline bool __tlb_remove_page_size(struct mmu_ga= ther *tlb, > struct page *page, bool delay_rmap, int page_size); > static inline bool __tlb_remove_folio_pages(struct mmu_gather *tlb, > struct page *page, unsigned int nr_pages, bool delay_rmap= ); > +static inline bool __tlb_remove_swap_entries(struct mmu_gather *tlb, > + swp_entry_t entry, int nr); > > #define tlb_flush tlb_flush > #define pte_free_tlb pte_free_tlb > @@ -69,6 +71,12 @@ static inline bool __tlb_remove_folio_pages(struct mmu= _gather *tlb, > return false; > } > > +static inline bool __tlb_remove_swap_entries(struct mmu_gather *tlb, > + swp_entry_t entry, int nr) > +{ > + return false; > +} > + > static inline void tlb_flush(struct mmu_gather *tlb) > { > __tlb_flush_mm_lazy(tlb->mm); > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h > index 709830274b75..8b4d516b35b8 > --- a/include/asm-generic/tlb.h > +++ b/include/asm-generic/tlb.h > @@ -294,6 +294,37 @@ extern void tlb_flush_rmaps(struct mmu_gather *tlb, = struct vm_area_struct *vma); > static inline void tlb_flush_rmaps(struct mmu_gather *tlb, struct vm_are= a_struct *vma) { } > #endif > > +#ifndef CONFIG_MMU_GATHER_NO_GATHER > +struct mmu_swap_batch { > + struct mmu_swap_batch *next; > + unsigned int nr; > + unsigned int max; > + encoded_swpentry_t encoded_entrys[]; > +}; > + > +#define MAX_SWAP_GATHER_BATCH \ > + ((PAGE_SIZE - sizeof(struct mmu_swap_batch)) / sizeof(void *)) > + > +#define MAX_SWAP_GATHER_BATCH_COUNT (10000UL / MAX_SWAP_GATHER_BATCH) > + > +struct mmu_swap_gather { > + /* > + * the asynchronous kworker to batch > + * release swap entries > + */ > + struct work_struct free_work; > + > + /* batch cache swap entries */ > + unsigned int batch_count; > + struct mmu_swap_batch *active; > + struct mmu_swap_batch local; > + encoded_swpentry_t __encoded_entrys[MMU_GATHER_BUNDLE]; > +}; > + > +bool __tlb_remove_swap_entries(struct mmu_gather *tlb, > + swp_entry_t entry, int nr); > +#endif > + > /* > * struct mmu_gather is an opaque type used by the mm code for passing a= round > * any data needed by arch specific code for tlb_remove_page. > @@ -343,6 +374,18 @@ struct mmu_gather { > unsigned int vma_exec : 1; > unsigned int vma_huge : 1; > unsigned int vma_pfn : 1; > +#ifndef CONFIG_MMU_GATHER_NO_GATHER > + /* > + * Two states of releasing swap entries > + * asynchronously: > + * swp_freeable - have opportunity to > + * release asynchronously future > + * swp_freeing - be releasing asynchronously. > + */ > + unsigned int swp_freeable : 1; > + unsigned int swp_freeing : 1; > + unsigned int swp_disable : 1; > +#endif > > unsigned int batch_count; > > @@ -354,6 +397,7 @@ struct mmu_gather { > #ifdef CONFIG_MMU_GATHER_PAGE_SIZE > unsigned int page_size; > #endif > + struct mmu_swap_gather *swp; > #endif > }; > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 165c58b12ccc..2f66303f1519 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -283,6 +283,64 @@ typedef struct { > unsigned long val; > } swp_entry_t; > > +/* > + * encoded_swpentry_t - a type marking the encoded swp_entry_t. > + * > + * An 'encoded_swpentry_t' represents a 'swp_enrty_t' with its the highe= st > + * bit indicating extra context-dependent information. Only used in swp_= entry > + * asynchronous release path by mmu_swap_gather. > + */ > +typedef struct { > + unsigned long val; > +} encoded_swpentry_t; > + > +/* > + * The next item in an encoded_swpentry_t array is the "nr" argument, sp= ecifying the > + * total number of consecutive swap entries associated with the same fol= io. If this > + * bit is not set, "nr" is implicitly 1. > + * > + * Refer to include\asm\pgtable.h, swp_offset bits: 0 ~ 57, swp_type bit= s: 58 ~ 62. > + * Bit63 can be used here. > + */ > +#define ENCODED_SWPENTRY_BIT_NR_ENTRYS_NEXT (1UL << (BITS_PER_LONG - 1)) > + > +static __always_inline encoded_swpentry_t > +encode_swpentry(swp_entry_t entry, unsigned long flags) > +{ > + encoded_swpentry_t ret; > + > + VM_WARN_ON_ONCE(flags & ~ENCODED_SWPENTRY_BIT_NR_ENTRYS_NEXT); > + ret.val =3D flags | entry.val; > + return ret; > +} > + > +static inline unsigned long encoded_swpentry_flags(encoded_swpentry_t en= try) > +{ > + return ENCODED_SWPENTRY_BIT_NR_ENTRYS_NEXT & entry.val; > +} > + > +static inline swp_entry_t encoded_swpentry_data(encoded_swpentry_t entry= ) > +{ > + swp_entry_t ret; > + > + ret.val =3D ~ENCODED_SWPENTRY_BIT_NR_ENTRYS_NEXT & entry.val; > + return ret; > +} > + > +static __always_inline encoded_swpentry_t encode_nr_swpentrys(unsigned l= ong nr) > +{ > + encoded_swpentry_t ret; > + > + VM_WARN_ON_ONCE(nr & ENCODED_SWPENTRY_BIT_NR_ENTRYS_NEXT); > + ret.val =3D nr; > + return ret; > +} > + > +static __always_inline unsigned long encoded_nr_swpentrys(encoded_swpent= ry_t entry) > +{ > + return ((~ENCODED_SWPENTRY_BIT_NR_ENTRYS_NEXT) & entry.val); > +} > + > /** > * struct folio - Represents a contiguous set of bytes. > * @flags: Identical to the page flags. > diff --git a/mm/memory.c b/mm/memory.c > index d6a9dcddaca4..023a8adcb67c > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1650,7 +1650,8 @@ static unsigned long zap_pte_range(struct mmu_gathe= r *tlb, > if (!should_zap_cows(details)) > continue; > rss[MM_SWAPENTS] -=3D nr; > - free_swap_and_cache_nr(entry, nr); > + if (!__tlb_remove_swap_entries(tlb, entry, nr)) > + free_swap_and_cache_nr(entry, nr); > } else if (is_migration_entry(entry)) { > folio =3D pfn_swap_entry_folio(entry); > if (!should_zap_folio(details, folio)) > diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c > index 99b3e9408aa0..33dc9d1faff9 > --- a/mm/mmu_gather.c > +++ b/mm/mmu_gather.c > @@ -9,11 +9,303 @@ > #include > #include > #include > +#include > > #include > #include > > #ifndef CONFIG_MMU_GATHER_NO_GATHER > +/* > + * The swp_entry asynchronous release mechanism for multiple processes w= ith > + * independent mm exiting simultaneously. > + * > + * During the multiple exiting processes releasing their own mm simultan= eously, > + * the swap entries in the exiting processes are handled by isolating, c= aching > + * and handing over to an asynchronous kworker to complete the release. > + * > + * The conditions for the exiting process entering the swp_entry asynchr= onous > + * release path: > + * 1. The exiting process's MM_SWAPENTS count is >=3D SWAP_CLUSTER_MAX, = avoiding > + * to alloc struct mmu_swap_gather frequently. > + * 2. The number of exiting processes is >=3D NR_MIN_EXITING_PROCESSES. Hi Zhiguo, I'm curious about the significance of NR_MIN_EXITING_PROCESSES. It seems th= at batched swap entry freeing, even with one process, could be a bottleneck for a single process based on the data from this patch: mm: attempt to batch free swap entries for zap_pte_range() https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=3Dmm-= stable&id=3Dbea67dcc5ee "munmap bandwidth becomes 3X faster." So what would happen if you simply set NR_MIN_EXITING_PROCESSES to 1? > + * > + * Since the time for determining the number of exiting processes is dyn= amic, > + * the exiting process may start to enter the swp_entry asynchronous rel= ease > + * at the beginning or middle stage of the exiting process's swp_entry r= elease > + * path. > + * > + * Once an exiting process enters the swp_entry asynchronous release, al= l remaining > + * swap entries in this exiting process need to be fully released by asy= nchronous > + * kworker theoretically. Freeing a slot can indeed release memory from `zRAM`, potentially returning it to the system for allocation. Your patch frees swap slots asynchronously= ; I assume this doesn=E2=80=99t slow down the memory freeing process for `zRA= M`, or could it even slow down the freeing of `zRAM` memory? Freeing compressed memory might not be as crucial compared to freeing uncompressed memory with present PTEs? > + * > + * The function of the swp_entry asynchronous release: > + * 1. Alleviate the high system cpu load caused by multiple exiting proc= esses > + * running simultaneously. > + * 2. Reduce lock competition in swap entry free path by an asynchronous= kworker > + * instead of multiple exiting processes parallel execution. > + * 3. Release pte_present memory occupied by exiting processes more effi= ciently. > + */ > + > +/* > + * The min number of exiting processes required for swp_entry asynchrono= us release > + */ > +#define NR_MIN_EXITING_PROCESSES 2 > + > +static atomic_t nr_exiting_processes =3D ATOMIC_INIT(0); > +static struct kmem_cache *swap_gather_cachep; > +static struct workqueue_struct *swapfree_wq; > +static DEFINE_STATIC_KEY_TRUE(tlb_swap_asyncfree_disabled); > + > +static int __init tlb_swap_async_free_setup(void) > +{ > + swapfree_wq =3D alloc_workqueue("smfree_wq", WQ_UNBOUND | > + WQ_HIGHPRI | WQ_MEM_RECLAIM, 1); > + if (!swapfree_wq) > + goto fail; > + > + swap_gather_cachep =3D kmem_cache_create("swap_gather", > + sizeof(struct mmu_swap_gather), > + 0, SLAB_TYPESAFE_BY_RCU | SLAB_PANIC | SLAB_ACCOUNT, > + NULL); > + if (!swap_gather_cachep) > + goto kcache_fail; > + > + static_branch_disable(&tlb_swap_asyncfree_disabled); > + return 0; > + > +kcache_fail: > + destroy_workqueue(swapfree_wq); > +fail: > + return -ENOMEM; > +} > +postcore_initcall(tlb_swap_async_free_setup); > + > +static void __tlb_swap_gather_free(struct mmu_swap_gather *swap_gather) > +{ > + struct mmu_swap_batch *swap_batch, *next; > + > + for (swap_batch =3D swap_gather->local.next; swap_batch; swap_bat= ch =3D next) { > + next =3D swap_batch->next; > + free_page((unsigned long)swap_batch); > + } > + swap_gather->local.next =3D NULL; > + kmem_cache_free(swap_gather_cachep, swap_gather); > +} > + > +static void tlb_swap_async_free_work(struct work_struct *w) > +{ > + int i, nr_multi, nr_free; > + swp_entry_t start_entry; > + struct mmu_swap_batch *swap_batch; > + struct mmu_swap_gather *swap_gather =3D container_of(w, > + struct mmu_swap_gather, free_work); > + > + /* Release swap entries cached in mmu_swap_batch. */ > + for (swap_batch =3D &swap_gather->local; swap_batch && swap_batch= ->nr; > + swap_batch =3D swap_batch->next) { > + nr_free =3D 0; > + for (i =3D 0; i < swap_batch->nr; i++) { > + if (unlikely(encoded_swpentry_flags(swap_batch->e= ncoded_entrys[i]) & > + ENCODED_SWPENTRY_BIT_NR_ENTRYS_NEXT)) { > + start_entry =3D encoded_swpentry_data(swa= p_batch->encoded_entrys[i]); > + nr_multi =3D encoded_nr_swpentrys(swap_ba= tch->encoded_entrys[++i]); > + free_swap_and_cache_nr(start_entry, nr_mu= lti); > + nr_free +=3D 2; > + } else { > + start_entry =3D encoded_swpentry_data(swa= p_batch->encoded_entrys[i]); > + free_swap_and_cache_nr(start_entry, 1); > + nr_free++; > + } > + } > + swap_batch->nr -=3D nr_free; > + VM_BUG_ON(swap_batch->nr); > + } > + __tlb_swap_gather_free(swap_gather); > +} > + > +static bool __tlb_swap_gather_mmu_check(struct mmu_gather *tlb) > +{ > + /* > + * Only the exiting processes with the MM_SWAPENTS counter >=3D > + * SWAP_CLUSTER_MAX have the opportunity to release their swap > + * entries by asynchronous kworker. > + */ > + if (!task_is_dying() || > + get_mm_counter(tlb->mm, MM_SWAPENTS) < SWAP_CLUSTER_MAX) > + return true; > + > + atomic_inc(&nr_exiting_processes); > + if (atomic_read(&nr_exiting_processes) < NR_MIN_EXITING_PROCESSES= ) > + tlb->swp_freeable =3D 1; > + else > + tlb->swp_freeing =3D 1; > + > + return false; > +} > + > +/** > + * __tlb_swap_gather_init - Initialize an mmu_swap_gather structure > + * for swp_entry tear-down. > + * @tlb: the mmu_swap_gather structure belongs to tlb > + */ > +static bool __tlb_swap_gather_init(struct mmu_gather *tlb) > +{ > + tlb->swp =3D kmem_cache_alloc(swap_gather_cachep, GFP_ATOMIC | GF= P_NOWAIT); > + if (unlikely(!tlb->swp)) > + return false; > + > + tlb->swp->local.next =3D NULL; > + tlb->swp->local.nr =3D 0; > + tlb->swp->local.max =3D ARRAY_SIZE(tlb->swp->__encoded_entrys); > + > + tlb->swp->active =3D &tlb->swp->local; > + tlb->swp->batch_count =3D 0; > + > + INIT_WORK(&tlb->swp->free_work, tlb_swap_async_free_work); > + return true; > +} > + > +static void __tlb_swap_gather_mmu(struct mmu_gather *tlb) > +{ > + if (static_branch_unlikely(&tlb_swap_asyncfree_disabled)) > + return; > + > + tlb->swp =3D NULL; > + tlb->swp_freeable =3D 0; > + tlb->swp_freeing =3D 0; > + tlb->swp_disable =3D 0; > + > + if (__tlb_swap_gather_mmu_check(tlb)) > + return; > + > + /* > + * If the exiting process meets the conditions of > + * swp_entry asynchronous release, an mmu_swap_gather > + * structure will be initialized. > + */ > + if (tlb->swp_freeing) > + __tlb_swap_gather_init(tlb); > +} > + > +static void __tlb_swap_gather_queuework(struct mmu_gather *tlb, bool fin= ish) > +{ > + queue_work(swapfree_wq, &tlb->swp->free_work); > + tlb->swp =3D NULL; > + if (!finish) > + __tlb_swap_gather_init(tlb); > +} > + > +static bool __tlb_swap_next_batch(struct mmu_gather *tlb) > +{ > + struct mmu_swap_batch *swap_batch; > + > + if (tlb->swp->batch_count =3D=3D MAX_SWAP_GATHER_BATCH_COUNT) > + goto free; > + > + swap_batch =3D (void *)__get_free_page(GFP_ATOMIC | GFP_NOWAIT); > + if (unlikely(!swap_batch)) > + goto free; > + > + swap_batch->next =3D NULL; > + swap_batch->nr =3D 0; > + swap_batch->max =3D MAX_SWAP_GATHER_BATCH; > + > + tlb->swp->active->next =3D swap_batch; > + tlb->swp->active =3D swap_batch; > + tlb->swp->batch_count++; > + return true; > +free: > + /* batch move to wq */ > + __tlb_swap_gather_queuework(tlb, false); > + return false; > +} > + > +/** > + * __tlb_remove_swap_entries - the swap entries in exiting process are > + * isolated, batch cached in struct mmu_swap_batch. > + * @tlb: the current mmu_gather > + * @entry: swp_entry to be isolated and cached > + * @nr: the number of consecutive entries starting from entry parameter. > + */ > +bool __tlb_remove_swap_entries(struct mmu_gather *tlb, > + swp_entry_t entry, int nr) > +{ > + struct mmu_swap_batch *swap_batch; > + unsigned long flags =3D 0; > + bool ret =3D false; > + > + if (tlb->swp_disable) > + return ret; > + > + if (!tlb->swp_freeable && !tlb->swp_freeing) > + return ret; > + > + if (tlb->swp_freeable) { > + if (atomic_read(&nr_exiting_processes) < > + NR_MIN_EXITING_PROCESSES) > + return ret; > + /* > + * If the current number of exiting processes > + * is >=3D NR_MIN_EXITING_PROCESSES, the exiting > + * process with swp_freeable state will enter > + * swp_freeing state to start releasing its > + * remaining swap entries by the asynchronous > + * kworker. > + */ > + tlb->swp_freeable =3D 0; > + tlb->swp_freeing =3D 1; > + } > + > + VM_BUG_ON(tlb->swp_freeable || !tlb->swp_freeing); > + if (!tlb->swp && !__tlb_swap_gather_init(tlb)) > + return ret; > + > + swap_batch =3D tlb->swp->active; > + if (unlikely(swap_batch->nr >=3D swap_batch->max - 1)) { > + __tlb_swap_gather_queuework(tlb, false); > + return ret; > + } > + > + if (likely(nr =3D=3D 1)) { > + swap_batch->encoded_entrys[swap_batch->nr++] =3D encode_s= wpentry(entry, flags); > + } else { > + flags |=3D ENCODED_SWPENTRY_BIT_NR_ENTRYS_NEXT; > + swap_batch->encoded_entrys[swap_batch->nr++] =3D encode_s= wpentry(entry, flags); > + swap_batch->encoded_entrys[swap_batch->nr++] =3D encode_n= r_swpentrys(nr); > + } > + ret =3D true; > + > + if (swap_batch->nr >=3D swap_batch->max - 1) { > + if (!__tlb_swap_next_batch(tlb)) > + goto exit; > + swap_batch =3D tlb->swp->active; > + } > + VM_BUG_ON(swap_batch->nr > swap_batch->max - 1); > +exit: > + return ret; > +} > + > +static void __tlb_batch_swap_finish(struct mmu_gather *tlb) > +{ > + if (tlb->swp_disable) > + return; > + > + if (!tlb->swp_freeable && !tlb->swp_freeing) > + return; > + > + if (tlb->swp_freeable) { > + tlb->swp_freeable =3D 0; > + VM_BUG_ON(tlb->swp_freeing); > + goto exit; > + } > + tlb->swp_freeing =3D 0; > + if (unlikely(!tlb->swp)) > + goto exit; > + > + __tlb_swap_gather_queuework(tlb, true); > +exit: > + atomic_dec(&nr_exiting_processes); > +} > > static bool tlb_next_batch(struct mmu_gather *tlb) > { > @@ -386,6 +678,9 @@ static void __tlb_gather_mmu(struct mmu_gather *tlb, = struct mm_struct *mm, > tlb->local.max =3D ARRAY_SIZE(tlb->__pages); > tlb->active =3D &tlb->local; > tlb->batch_count =3D 0; > + > + tlb->swp_disable =3D 1; > + __tlb_swap_gather_mmu(tlb); > #endif > tlb->delayed_rmap =3D 0; > > @@ -466,6 +761,7 @@ void tlb_finish_mmu(struct mmu_gather *tlb) > > #ifndef CONFIG_MMU_GATHER_NO_GATHER > tlb_batch_list_free(tlb); > + __tlb_batch_swap_finish(tlb); > #endif > dec_tlb_flush_pending(tlb->mm); > } > -- > 2.39.0 > Thanks Barry