From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24763C54E71 for ; Thu, 22 May 2025 02:05:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A4216B0083; Wed, 21 May 2025 22:05:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 254936B0088; Wed, 21 May 2025 22:05:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 143586B0089; Wed, 21 May 2025 22:05:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E4F2D6B0083 for ; Wed, 21 May 2025 22:05:52 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id BE4AEBFF42 for ; Thu, 22 May 2025 02:05:51 +0000 (UTC) X-FDA: 83468902902.08.26C0696 Received: from mail-vk1-f170.google.com (mail-vk1-f170.google.com [209.85.221.170]) by imf03.hostedemail.com (Postfix) with ESMTP id E5B102000C for ; Thu, 22 May 2025 02:05:49 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=U4XDj0oS; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747879549; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TfcSpn9AJtIs0/yJRdv0J7LsjW7YRrvFwD8SBJzlwA8=; b=pXvsWo8e1GxXPspDHNzHL5vv7sYQolC4E+tJgiywlOOWFxWQEwZt6nXp072TIQOC6Ti0ph QNQIHhwJF+GOi5SDYFBT/WHWXdYiHyqzOH0/hD3Y0VjRbwPznu0AP+saSwRBmGQGWIqBpT 2JGjX7IngoV0JVQZawCmR+3RhwUif9I= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=U4XDj0oS; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747879549; a=rsa-sha256; cv=none; b=gD6uIV8Op8TVW/p2UOZHyxmXZePjcmfEcZodDSc7yly0iapMewnA0CQQ46+IV7OyfiFN/b zd+WTqggtMNCzJwcWr+JjoVx+jwkXesyovlXA4u9u0m9/MCMlgJEISTKx017VwwYCGurqb f4ZLbpv9LYIuyHExUPKcRhdDHzPh4SI= Received: by mail-vk1-f170.google.com with SMTP id 71dfb90a1353d-52e0d047d31so2788069e0c.2 for ; Wed, 21 May 2025 19:05:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747879549; x=1748484349; darn=kvack.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=TfcSpn9AJtIs0/yJRdv0J7LsjW7YRrvFwD8SBJzlwA8=; b=U4XDj0oSEhZ0E/tvXG840vfOh/bdNxd+FtbStKC8ZFezFIS7z1UJCGNZgrkSDMXejP jvkMTYtVZBlsBva5Tm7v/nen0PKyOEk8tTwWiVJ2n93sOfpdf4PPrr1t1WqlXu0QxlHd 4EsyJHpfNosXepsjxroljZfRN1/G8E2WsVIm2ExmHEHlOOeQTMp5NOg5mOc94rr22iUi /UkYFgz2crEfPWIJDS7CanbUnwKhV/Np5sCnLSaQiAvDcuGpymF4qUhtnp8PZmoebJBZ luT5wXGJCIX0ek0AQYQcEcJomDpwylRqjP6EBLY8HNW5zERdoFECEOa6GZu9nDqZBDkK 7iEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747879549; x=1748484349; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TfcSpn9AJtIs0/yJRdv0J7LsjW7YRrvFwD8SBJzlwA8=; b=ar87ggCIjW3kKW7Ht9lctXWUhR7UEZeHp3Bxhk5CXsFv90Eub5dcgaJ34pGhWXm3so PPKWMZb5XviMbv4X05XumUopUAPLT7NXBlzBHnubyVmc5EDTwI5Lyn08tT7Ax0LRZrho AShWlFXDZU8y8WfoAv2JGkiwkx/YWJho7kVxgp/L4RVXWossOZ/d8dco3ldc57HLvyBR DZlT3UfN0VMJXnJHUtzl2T1BqskhX3//sSwzFW04MJUnjCjPf5FDLwtyNf/wvYHGVux8 EIyWXfDCNvagz6uwBktzcLEICjqBfWZ0//VtfNRiiJnvE+4tGuzy+KS5eR+xhGmlXdJH KY5Q== X-Forwarded-Encrypted: i=1; AJvYcCUO93WALBMlm/VVQ2mR3lCIZ97ySdSvQbaBGObWMwJPof7y7yGvuIvzcjc9wNZeu6opwiMlJy+WYA==@kvack.org X-Gm-Message-State: AOJu0YwUpm9xpfTDLnSqG6pbSYJE/t3JTHEbmMMfE6vXSnjtp2kh4Pwo q0DNpf1ueQRWGa5yCiNGkEHfEDWFC/bCnUxWyyMS0VSwbaSm6LIkVNxV1Yr6VahkPx/OZpmb0tT dVupBQRyzH1K+gCtmXftVM5WGVYWBfVs= X-Gm-Gg: ASbGncs48PWhCqLXQAe1OTeL0AcTiSwEBr3ZOVDY7JEC0ZXpdh3PKjGhPTnhfederPk Pdc1yp/aGrjj2T8hodpHeaJ+W+SKHDgQzKHkjABpHzaqeJSPxyLE2EFYY2gO25vsPiU6Ths6XPB dZZNhmYpDTahVANtBM9Utac5nFZWhrj2dAiQ== X-Google-Smtp-Source: AGHT+IGkhKjoXbG/7RhJh9ClxQvyGOe78DoLqgfKSz1L9F+0+1ISN9VlM0pJauQqNv1ern4cNzoHIIrP6N6IYEdjL58= X-Received: by 2002:a05:6102:c0e:b0:4db:10c6:319f with SMTP id ada2fe7eead31-4dfa6bf8c2amr23167425137.19.1747879548826; Wed, 21 May 2025 19:05:48 -0700 (PDT) MIME-Version: 1.0 References: <20250514070820.51793-1-21cnbao@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Thu, 22 May 2025 14:05:37 +1200 X-Gm-Features: AX0GCFuJvK1sHOpD8ShmMfDXD5NjCslpGjySxSdbTuHvd2HlgsrJtNyqAWg9UMI Message-ID: Subject: Re: [PATCH RFC v2] mm: add support for dropping LRU recency on process exit To: "Liam R. Howlett" , Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, zhengtangquan@oppo.com, Barry Song , Baolin Wang , David Hildenbrand , Johannes Weiner , Matthew Wilcox , Oscar Salvador , Ryan Roberts , Zi Yan , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: E5B102000C X-Rspamd-Server: rspam09 X-Stat-Signature: eiekn159yobgogp3684w4fcz5sur9iha X-HE-Tag: 1747879549-798810 X-HE-Meta: U2FsdGVkX18Zr4TjZZc6OQP/OjCEJMLpufQK5EAQRLpA7Mq2Trn1YCBXi/ZeikvhFGQUDuNYwQCBh4zq7KLqNOATEcdvKauJlopOOE4qcRNp5XqoZWee9WPS+Rg3IOWUWYoGr+G4Bxpu64YxuKj/maecqfgvrH2sS5wIvaaz3lVWAT1lqA5PoccKJXj/cIZd9qKTMpJ9Se1fOb2akxB5GBNNXEjUngcLupRZB9TnNHpX8aw67iEwuAtLEZZo/1NfYBpHtehAJiQZXldqidV5lRhF+LPii4a3YMo+q+8IuYQiuVIfD5FlPs+XxXGY5tys6zm3btSfAyjndFHPVO59p1wbAs/m0DYrnJZim53wBP7kDYNOmnltlIFexiD4WCvd4WI976Td8ffgXxGuZJKYVm1/U7Mf55pqL1/c3OPbMaHMxD6ZAFqDEUMHH4ug0+dpxxxrWYJ8nydf2rZYlMPwccEkvX1aPVwiddfnUqVrvmuaIdQGzhgnPmJTknkegx5qId4wcRORcVBZTXt1igjT4zYYbUyuRr5yJS0ZeRpsWFfTH1VOvfxbYTiTw4olOKKPwYsp+0qRThVlPl0cibN31fdj5cpInVxqu0uLPD1kC56ICdr/uPj7NGA0T5LC3nEDLBUSM5avTrS2Z61dE/3LH7NUxfB6elT6NJDdishmFw0Aj8ZxoCMns6p/9DlRcN0ZiIK5a8DlRwoxih9gh9iuYnNptObln2oWddzpnEPm5J/haTQFyCi67TakLg5u4X0vHYsULQDnaLaGgqlFKFCj3A1myGuauKMhOlNavOBfzfnO9mQwwJ2WDdQ5H0hjZKJkeylfrbdemtOCTmBLxXmpy69Uo102B2UqyBUN7NnzKMI6duyWbgdTZkOXWCgxPNqnWrMVRuMAVhbD9ef0qcVmKNT4S9I5Tp3lmLX5/apbZGq7+STuj6irOQGC5Js/Sl5H7cseTyjY3p59B08Sc9H uheIy4UJ YHVAZAjmGUuH6tAykbkx51FY+7CQu9jSnFMe9PUhn3B401VNwMeYEN6jnwOLACCtfjoZ14tvmXblWt2GXTM+5bgWod+3EfGZasayWnzcN05v12ch7X9pD173G5U9Uo6MQzMiV51joT8p9jex9yLPxjcs1ibhecbRrMV+bpzJxlZdHToKjFFopJYG+2nEk8aiCcRv0nLzmZqOmhUIiNSYmue+y7oFaz9nspcBPZ3AVErj1NFTMJ/pcAnuPlqsFkD9fRhzeq0G/e8rSzZAgMEkOoaIIn8109ky6etwage4Rz3n60nR+o46Eag+nG4w3epncyp7cPhzBERr+EX+ng+Pt0QO0Dfekz0DDmvEaGCKP8JDSUYAFj/6mGP4dhkxJpAW+Vevp7VE/tv9Y9Ai3t2yv3vFFo7vfLWlWa7424vEK9pc2sRMP2ANOBkUHyMm8S4WORDvf+bSJE0CPK0B9BR+Dbjl01hyd630SlCQ8iLoqbSx0jozvXJFJvsOpwoxA67ZaPDRCX2EyM58clFRw8teJ8b+y9T3BHnYqB9kJ5knWMrVaY4YhRAxZtdkA5vWz8eSLnt8B7ZgnmBHFBxnqVmbLyayL6Wk/l2imEZF8hgQmN0TlrleMwMEjINymQJJteueUfHtnbJalX53cbPXEjUpTTZ+b263BCP7YE15a5whXBf8EuSbgHt756xJAipqU+Zx5IL8ewUIFxAu5/MAWVzaS4dYexlrhKMPPJnPx5nPPzVKdjtg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Liam, I really appreciate your review=E2=80=94thank you! On Wed, May 21, 2025 at 4:20=E2=80=AFAM Liam R. Howlett wrote: > > * Barry Song <21cnbao@gmail.com> [250514 03:08]: > > From: Barry Song > > > > Currently, both zap_pmd and zap_pte always promote young file folios, > > regardless of whether the processes are dying. > > However, in systems where the process recency fades upon dying, we may > > want to reverse this behavior. The goal is to reclaim the folios from > > the dying process as quickly as possible, allowing new processes to > > acquire memory ASAP. > > For example, while Firefox is killed and LibreOffice is launched, > > activating Firefox's young file-backed folios makes it harder to > > reclaim memory that LibreOffice doesn't use at all. > > > > On systems like Android, processes are either explicitly stopped by > > user action or reaped due to OOM after being inactive for a long time. > > These processes are unlikely to restart in the near future. Rather than > > promoting their folios, we skip promoting and demote their exclusive > > folios so that memory can be reclaimed and made available for new > > user-facing processes. > > > > Users possibly do not care about the recency of a dying process. > > However, we still need an explicit user indication to take this action. > > Can you add why? It'd be nice to capture the reasons pointed out in v1 > discussion as they seem important to why this isn't set as a default for > all tasks. Essentially, I took Johannes=E2=80=99 point (and to some extent David=E2=80= =99s as well) to be that it behaves somewhat unpredictably in broader application scenarios=E2=80=94for example, when repeatedly executing a file in a script or restarting an application shortly after it exits. Also, when a shared library is mapped by multiple processes, we might still want to retain recency information from a process that is exiting. So we might only want to do that only for exclusive folios. This actually leads to two questions: 1. Are we confident that the recency of a dead process is no longer useful within a period of time? 2. Should we limit the optimization only to exclusive folios=E2=80=94for example, shared objects (.so files) that are specific to the exiting process? For both questions, the answer seems to be yes. Though in the first case=E2=80=94when we repeatedly restart the same application=E2=80=94the folios are likely still in the LRU and may still be hit even if we unconditionally demote them. But that's not guaranteed. So we likely need a userspace hint to eliminate the uncertainty. > > > Thus, we introduced a prctl to provide that necessary user-level hint > > as suggested by Johannes and David. > > I'm not sure it really makes much of a difference if we update the lru > or not in this case. Johannes point about this small change having > unknown results for the larger community is certainly the best argument > as to why we need this to be opt-in. > > We should probably document it so that people can opt-in though :) > > > > > We observed noticeable improvements in refaults, swap-ins, and swap-out= s > > on a hooked Android kernel. More data for this specific version will > > follow. > > Looking forward to the results. What happens when I kill my app and > reopen it? (close all apps, open the one that was being annoying?) I'm not sure I fully understand your question. In Android, we're primarily concerned with smooth app switching. For example, in a sequence like A =E2=86=92 B =E2=86=92 C =E2=86=92 D =E2=86=92 E, if we can = quickly reclaim folios from dead processes, it helps us launch new (different) apps faster. However, if we do A =E2=86=92 kill A =E2=86=92 start A =E2=86=92 kill A =E2= =86=92 start A repeatedly, it=E2=80=99s likely not a problem because our memory can hold the same application. The issue arises when memory isn=E2=80=99t enough to hold A + B + C + D + E simultaneously. I=E2=80=99m not overly concerned about repeatedly restarting the same application in Android. However, for wider scenarios across various industries, I=E2=80=99m uncertain. > > > > > Cc: Baolin Wang > > Cc: David Hildenbrand > > Cc: Johannes Weiner > > Cc: Matthew Wilcox (Oracle) > > Cc: Oscar Salvador > > Cc: Ryan Roberts > > Cc: Zi Yan > > Cc: Lorenzo Stoakes > > Cc: "Liam R. Howlett" > > Cc: Vlastimil Babka > > Cc: Mike Rapoport > > Cc: Suren Baghdasaryan > > Cc: Michal Hocko > > Signed-off-by: Barry Song > > --- > > -v2: > > * add prctl as suggested by Johannes and David > > * demote exclusive file folios if drop_recency can apply > > -v1: > > https://lore.kernel.org/linux-mm/20250412085852.48524-1-21cnbao@gmail.= com/ > > > > include/linux/mm_types.h | 1 + > > include/uapi/linux/prctl.h | 3 +++ > > kernel/sys.c | 16 ++++++++++++++++ > > mm/huge_memory.c | 12 ++++++++++-- > > mm/internal.h | 14 ++++++++++++++ > > mm/memory.c | 12 +++++++++++- > > 6 files changed, 55 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > > index 15808cad2bc1..84ab113c54a2 100644 > > --- a/include/linux/mm_types.h > > +++ b/include/linux/mm_types.h > > @@ -1733,6 +1733,7 @@ enum { > > * on NFS restore > > */ > > //#define MMF_EXE_FILE_CHANGED 18 /* see prctl_set_mm_exe_f= ile() */ > > +#define MMF_FADE_ON_DEATH 18 /* Recency is discarded on proces= s exit */ > > Why is recency not in the MMF name? Why not MMF_NO_RECENCY or > something? I included RECENCY in the name but found it too long. On the other hand, MMF_NO_RECENCY seems insufficient to convey the true meaning, since we do have recency=E2=80=94it=E2=80=99s just lost on death. So perhaps the ori= ginal, longer names I considered are better: MMF_RECENCY_FADE_ON_DEATH or MMF_NO_RECENCY_ON_DEATH? > > I guess we are back to no space in this flag. Yes, it is 32 bits. > > > > > #define MMF_HAS_UPROBES 19 /* has uprobes */ > > #define MMF_RECALC_UPROBES 20 /* MMF_HAS_UPROBES can be wrong *= / > > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h > > index 15c18ef4eb11..22d861157552 100644 > > --- a/include/uapi/linux/prctl.h > > +++ b/include/uapi/linux/prctl.h > > @@ -364,4 +364,7 @@ struct prctl_mm_map { > > # define PR_TIMER_CREATE_RESTORE_IDS_ON 1 > > # define PR_TIMER_CREATE_RESTORE_IDS_GET 2 > > > > +#define PR_SET_FADE_ON_DEATH 78 > > +#define PR_GET_FADE_ON_DEATH 79 > > + > > #endif /* _LINUX_PRCTL_H */ > > diff --git a/kernel/sys.c b/kernel/sys.c > > index c434968e9f5d..cabe1bbb35a4 100644 > > --- a/kernel/sys.c > > +++ b/kernel/sys.c > > @@ -2658,6 +2658,22 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned lon= g, arg2, unsigned long, arg3, > > clear_bit(MMF_DISABLE_THP, &me->mm->flags); > > mmap_write_unlock(me->mm); > > break; > > + case PR_GET_FADE_ON_DEATH: > > + if (arg2 || arg3 || arg4 || arg5) > > + return -EINVAL; > > + error =3D !!test_bit(MMF_FADE_ON_DEATH, &me->mm->flags); > > + break; > > Is there a usecase for get? Probably not. I was just trying to implement put/get for a pair. I=E2=80=99m happy to remove it if you feel it=E2=80=99s redundant. > > > + case PR_SET_FADE_ON_DEATH: > > Could you just check the value prior to setting and just return if it's > what you want? In which case, the setting is just change_bit(), and > there probably isn't a need for a get? Ok. > > > + if (arg3 || arg4 || arg5) > > + return -EINVAL; > > + if (mmap_write_lock_killable(me->mm)) > > + return -EINTR; > > + if (arg2) > > + set_bit(MMF_FADE_ON_DEATH, &me->mm->flags); > > + else > > + clear_bit(MMF_FADE_ON_DEATH, &me->mm->flags); > > + mmap_write_unlock(me->mm); > > + break; > > case PR_MPX_ENABLE_MANAGEMENT: > > case PR_MPX_DISABLE_MANAGEMENT: > > /* No longer implemented: */ > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 2780a12b25f0..c99894611d4a 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -2204,6 +2204,7 @@ static inline void zap_deposited_table(struct mm_= struct *mm, pmd_t *pmd) > > int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, > > pmd_t *pmd, unsigned long addr) > > { > > + bool drop_recency =3D false; > > pmd_t orig_pmd; > > spinlock_t *ptl; > > > > @@ -2260,13 +2261,20 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct= vm_area_struct *vma, > > add_mm_counter(tlb->mm, mm_counter_file(folio), > > -HPAGE_PMD_NR); > > > > + drop_recency =3D zap_need_to_drop_recency(tlb->mm= ); > > /* > > * Use flush_needed to indicate whether the PMD e= ntry > > * is present, instead of checking pmd_present() = again. > > */ > > - if (flush_needed && pmd_young(orig_pmd) && > > - likely(vma_has_recency(vma))) > > + if (flush_needed && pmd_young(orig_pmd) && !drop_= recency && > > + likely(vma_has_recency(vma))) > > folio_mark_accessed(folio); > > + /* > > + * Userspace explicitly marks recency to fade whe= n the process > > + * dies; demote exclusive file folios to aid recl= amation. > > + */ > > + if (drop_recency && !folio_maybe_mapped_shared(fo= lio)) > > + deactivate_file_folio(folio); > > } > > > > spin_unlock(ptl); > > diff --git a/mm/internal.h b/mm/internal.h > > index 6b8ed2017743..af9649b3e84a 100644 > > --- a/mm/internal.h > > +++ b/mm/internal.h > > @@ -11,6 +11,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -130,6 +131,19 @@ static inline int folio_nr_pages_mapped(const stru= ct folio *folio) > > return atomic_read(&folio->_nr_pages_mapped) & FOLIO_PAGES_MAPPED= ; > > } > > > > +/* > > + * Returns true if the process attached to the mm is dying or undergoi= ng > > + * OOM reaping, and its recency=E2=80=94explicitly marked by userspace= =E2=80=94will also > > + * fade; otherwise, returns false. > > + */ > > +static inline bool zap_need_to_drop_recency(struct mm_struct *mm) > > This name is confusing. We are zapping the need to drop the recency? If > this returns false, then the need to drop recency is false.. It is not > very easy to read and harder to understand how it translates to the > values it returns. > > How about mm_has_exit_recency(), like vma_has_recency()? > Or mmf_update_recency()? It seems mm_has_exit_recency() is good. > > > +{ > > + if (!atomic_read(&mm->mm_users) || check_stable_address_space(mm)= ) > > FYI, failed forks may also set the address space as unstable. > > > + return !!test_bit(MMF_FADE_ON_DEATH, &mm->flags); > > + > > + return false; > > +} > > + > > /* > > * Retrieve the first entry of a folio based on a provided entry withi= n the > > * folio. We cannot rely on folio->swap as there is no guarantee that = it has > > diff --git a/mm/memory.c b/mm/memory.c > > index 5a7e4c0e89c7..6dd01a7736a8 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -1505,6 +1505,7 @@ static __always_inline void zap_present_folio_pte= s(struct mmu_gather *tlb, > > bool *force_flush, bool *force_break, bool *any_skipped) > > { > > struct mm_struct *mm =3D tlb->mm; > > + bool drop_recency =3D false; > > bool delay_rmap =3D false; > > > > if (!folio_test_anon(folio)) { > > @@ -1516,9 +1517,18 @@ static __always_inline void zap_present_folio_pt= es(struct mmu_gather *tlb, > > *force_flush =3D true; > > } > > } > > - if (pte_young(ptent) && likely(vma_has_recency(vma))) > > + > > + drop_recency =3D zap_need_to_drop_recency(mm); > > + if (pte_young(ptent) && !drop_recency && > > + likely(vma_has_recency(vma))) > > > I really don't like that you are calling an atomic_read() and two flag > checks every time this block of code it executed. This must impact your > performance? Fair enough. That seems like a valid point to consider regarding atomic operations. > > How about this: > 1. Check in unmap_vmas() that the range is 0 - ULONG_MAX, and if the OOM > flag is set. > 2. set a new zap_flags_t flag (mmf_update_recency, maybe?) if > test_bit(MMF_FADE_ON_DEATH) > 3. check zap_details->zap_flags if that bit is set in this function. > 4. (hopefully) profit with better performance :) > > Since this really is a zap flag, it fits to make it one. It also means > that you will not need to check an atomic and will only check the one > flag as apposed to two. > > I think we can live with some user (probably syzbot) unmapping 0 - > ULONG_MAX and incorrectly checking a flag and, in the very rare case of > actually using this flag, does not do the correct LRU aging. If you > unmap everything, we can be pretty confident that you will be on the > exit path rather quickly. Good idea=E2=80=94let me give this a try. > > > folio_mark_accessed(folio); > > rss[mm_counter(folio)] -=3D nr; > > + /* > > + * Userspace explicitly marks recency to fade when the pr= ocess dies; > > + * demote exclusive file folios to aid reclamation. > > + */ > > + if (drop_recency && !folio_maybe_mapped_shared(folio)) > > + deactivate_file_folio(folio); > > Thanks, > Liam > Thanks Barry