From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEAF4C3ABDA for ; Wed, 14 May 2025 07:08:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D46E36B0082; Wed, 14 May 2025 03:08:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF6406B00C4; Wed, 14 May 2025 03:08:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B47CC6B00B3; Wed, 14 May 2025 03:08:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 987436B00D3 for ; Wed, 14 May 2025 03:08:34 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 716F81CDEDD for ; Wed, 14 May 2025 07:08:35 +0000 (UTC) X-FDA: 83440635390.08.E7FB5D2 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf22.hostedemail.com (Postfix) with ESMTP id 97F40C000A for ; Wed, 14 May 2025 07:08:33 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XY3U08hz; spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747206513; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=oMr4aF1l/qw57+l4Bi/2yJ5W0FLN7beti6Elj9tOF5g=; b=lqdxx8oRMOVGM0gcpXbQqdbHq9zryYr2fTIoYfvTxGATZ1h35N4weUF5LYRvZx1d3dIDRq GglGiUHbMkOCI8GL9FloxuKdzkIoYZl5BwpNXnv0Htbi7WwcQlhiXUXDfOTW9L16NF2yWu 9pFwxhnuCIMR7Jvz+3+K6Vz447MUNeM= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XY3U08hz; spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747206513; a=rsa-sha256; cv=none; b=mVt6T06T3LI9tsevE7cuoTfHBQPU/FSqRigjRmPp8EjU0WMIgC4FC30cxcJvJltia53vPm m5Z2jnmV5sP1SpxM3Cakg6E0RZ7YQHcCqVdkY3cxa6RHZYmQ5m9L7FzAXd4GlsPw3LO5o+ jgjBXopfuxeFbQeIqNYFCMgtxoyQBjU= Received: by mail-pj1-f53.google.com with SMTP id 98e67ed59e1d1-30a93d80a80so5138404a91.3 for ; Wed, 14 May 2025 00:08:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747206512; x=1747811312; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=oMr4aF1l/qw57+l4Bi/2yJ5W0FLN7beti6Elj9tOF5g=; b=XY3U08hzg6dK+avvDclQK1O2h0LJCoaFRJTbqlVa6ETbgQdOPI9iS2nBx/OMH/hr6c AiGYfkXBrkUy2JDKOnrqLUGEF2pbblZZTBWyDlDqh6S7sIwWsplYw7sLKtPeJJtHcuJT QhGLi1LjppbNEA5VmBRQtCHnCtmSZWb2SiOnZ55k7YikcTdcyMWnSfq7VbrDakD1kJPi ooKh374xqKatNxf0r9pXBdSBKSuONEUJ++H25NpIidHgRHrfIvPzkSTaJJgvSyjZZSrW u1p5R5XufU9LVX7mjZQ5RmtP5UnUUu02atTXUrVweoXcmrmoD9FjLdvjTf3v2Y+B31c3 U5Bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747206512; x=1747811312; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oMr4aF1l/qw57+l4Bi/2yJ5W0FLN7beti6Elj9tOF5g=; b=brj+STjJ1k0ujRRHxQQHKoYlaPHWf2VqBVuTMiY6L7udjIEPflCPppVFKSkQRY/HfM VkXpl37+xHdTT25eX89zUn155hCj+gFOc4kgaoG2jXTR7OqH2cSVwdMkr5KxPDTj31re L9gKscetuqT0YuyW2955zn3JGdN4tjqZJZvM5Z/QQCg+Qd8swz5IwqjlypfP7ExjcKaw QqCFYDG203XjQEP8SaGGEpe2W7Q0IKopC6oMQIYpKCWNrotoyoHrFvxo3ZndnJPatFFh TdEjqgQzFx6xIOjvHVreZOaYq8h4YUvC1OIYkqlQfSH0hSuc0kUcdPvZWN6YWfAyw+Ef 3f+A== X-Forwarded-Encrypted: i=1; AJvYcCVM6D1WNY+G32Cyu6hGagPsRrGXqAgP3QcYQ5H5R1itycxSSz2AySlYQHzinlKxlcCQziMw4Y+DSw==@kvack.org X-Gm-Message-State: AOJu0YzAKr68BHA8VVsRHR6fSc+i8GedZ5MISjbvTPFoggsB+5xk4IzL Ce2Oz8mikwZ+0Tuu1XAsugJfBl31JGe/e9iTEp4ydOSVu4R8qMmJ X-Gm-Gg: ASbGncv03Abp9eslPbpLSj2ZgGroxvQmoGADq0Yk4+MIPxXLha18xvXfSn4QA3xvWuI XnT3fjdHt9JgCKlmMYqQLkptDYN8kBfj/uiVJE8+wBz/rjTlwzoxtW+2MfHXLDzP7p6Rx8JTSVS lNBmUD7AOD5qZID9w2OT0tS3rsxW7SsfYAU9HWGwiO1E7/sg59l9hOCDf+Sz59iMWLAXJce4Ln/ nreqbVn7IN6AAtirDiqFBj3pgFft2w+rXxsRycSIr4HUsveXNcp4yWuNHcxJmNxe6TS0RHONTN1 7HCoaJS/4rMNZSRzbNl71nvdiIuT+ESDU79IRfwa3cPXKCxhEm2vtAgh9MjjZ0bQmiaFQjsDmg= = X-Google-Smtp-Source: AGHT+IEHI5kdVGTjYtz9myo26Efk7NSbVRSeMuqcNXIlr0n2lLwtByYRxpM6B5P0cqoY8Dw+lBKqsQ== X-Received: by 2002:a17:90b:2dc2:b0:2fe:994d:613b with SMTP id 98e67ed59e1d1-30e2e687a8emr3712405a91.35.1747206512096; Wed, 14 May 2025 00:08:32 -0700 (PDT) Received: from Barrys-MBP.hub ([118.92.10.104]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30e33401768sm852285a91.3.2025.05.14.00.08.25 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 14 May 2025 00:08:31 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, zhengtangquan@oppo.com, Barry Song , Baolin Wang , David Hildenbrand , Johannes Weiner , Matthew Wilcox , Oscar Salvador , Ryan Roberts , Zi Yan , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko Subject: [PATCH RFC v2] mm: add support for dropping LRU recency on process exit Date: Wed, 14 May 2025 19:08:20 +1200 Message-Id: <20250514070820.51793-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: osme6a1dmws7azaphpnkcwzspwtokpwm X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 97F40C000A X-HE-Tag: 1747206513-777621 X-HE-Meta: U2FsdGVkX19DaNJIELYjGW/ivknLwcvpJEPQwuHfpOGWyfle6Zh7FjhtOhZgsltjOoFGkFuvuCViNaHwvMNDE9FfCKqmjUublvm4zNSPCWYvhqmF78Nj0H1RrKPjjSRdOnjQ/yIIp3cisftsD0Kxq7YG/VTmnPCvcvvef8AYMki22Db2PZBTXZLmQUAVDP5WrRMzxl7/dzCQySOBey+c0ZjKM+fwQZhSTU+lvT1ovjtpuS26dP29ndX+ijAlnfBsdwxNJ24ymJtuBpDU/ZgRYpAH/S7MME8My3fD3J+SeJemt/DU4Y0WdrJR4ViPlxjh0ko2oR8/Y9FGjFoYYiQdNnCe7J4YdQziqmW+FJuJyk34IrFU5wRTmsWiMdRrgHe53TAnu3G4GF/B9kAAK3mKs8aNY3dddw1gHe30Yce9+AOke3CXlPQ75XNmIvXItoo1UntRjYN8sUP9ZTPEMnfHO0E8TNwJez8z4WFX+Y3EbRbs1KKY5ERb+nffL2rZio9O0G7L6bYY+w/ciGMQumsd4JZxXGPzyt3pGwV/SWcI7Nemb9TjQNslJ0auCOisYu999S6gkFFTICTzBImHeValZduYVUNgoQt66/+jGmW0h0k55x1QwNSbanGYYTQ88+IS6ke+JOstMGiCKs6PBv8/O4atGdufKs89XL9qzDEuw1NHGVQs8/VbBhU3QE6ZMnwDhlsIstsNrvfQyxFGPiLBSz62ReqrEZzzRowcZIoYe9eOgeU6IFyPfunIHAKSycal4lYUCFzo71Fc3QfuCgdOFUS9bkSjZJy4wcn5Vt93dCM8IEztO7+8iL5PbLm3SyPNZOocHRcaKdZKnYgXuGm3w1eVQKzarLl2IdxwMNHEItBsc0tS48GSBH3I7nKERoBUm+mGo8QHP/FLI8kVJ454jRjSDhUe2zT8W3/c4NKSXpMehCzgb0asrzjMPHOq/8H0x+ZukWVDUbLoUFJnPkt gIsq1425 c76T1txfcl34YOU5c3/XiwR//xpaFQevV3qTXY3fUEeH7FRsgQ1xHN0vmvlIfawtDoJmL7czc7WalgighGNe/ke/yjc6ZCgBOSpgNjrLUjFL5dyjRm5cAjGJXC16UEQjtf8Vs5XUZoQLcLZ+kxqUwkJzuUO5zR84eeCCpfV5H+7A4hVZTCSY9I4aN+Gv87w6n/9rkxtQy74kwJC8ERCWi3xbjpyProxk0x2O7anp94XrOyr9QC+0OfrZPLHBP6jcEIzlKzaufynMuQ4Bb3tFLwQUkrDnwappKCTFrpRw8+OqnWcUGinegIQP3DU4op4bF07aMy1vqC8DjjWnbcaiYMdM833cR+OOb+OxDh5GJRjaBO91KISo68Gwxay5hRLLrhSUHrE6S6bJj7D6jANZOuO1rOUWVnYHS8ltai0ZwwcxMJ6PH4I3OA1ZsEfOMMLi+ImBtZDMIrQb9rliLQo5eNyCcI0vfnPirMOgt2BpIu+dbzBAqMMdZ2j6NoTwvyc9kqIsqLH5KhJkm3vSbS211bcKkR8u9U926lkBR/aE9twHQWa1f+EzKjrlYXn39Urep39dCU4wfXQuWub0KJ8ipMjV25iDHDc4H3AAo+P9CdOJWnLetfYUdulNmIgUBfN7YKfPIUn35nAN34A8MvgTRvoYsjRyFz5go7zV5Uylw4dkN7TuSB5WwBpbdFnRnD/QTYNFMtlcYEaUQpZ/gr/Tjj0Z982NEY4IF3zr1JoPSnbpXdoDV94dxVtZxH+2JupAEkBTGsOtbH9XR3zr6RjSkZXIEzCQZC82z6Hf+v55OpOFpDTtaGF7WpItpHw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song Currently, both zap_pmd and zap_pte always promote young file folios, regardless of whether the processes are dying. However, in systems where the process recency fades upon dying, we may want to reverse this behavior. The goal is to reclaim the folios from the dying process as quickly as possible, allowing new processes to acquire memory ASAP. For example, while Firefox is killed and LibreOffice is launched, activating Firefox's young file-backed folios makes it harder to reclaim memory that LibreOffice doesn't use at all. On systems like Android, processes are either explicitly stopped by user action or reaped due to OOM after being inactive for a long time. These processes are unlikely to restart in the near future. Rather than promoting their folios, we skip promoting and demote their exclusive folios so that memory can be reclaimed and made available for new user-facing processes. Users possibly do not care about the recency of a dying process. However, we still need an explicit user indication to take this action. Thus, we introduced a prctl to provide that necessary user-level hint as suggested by Johannes and David. We observed noticeable improvements in refaults, swap-ins, and swap-outs on a hooked Android kernel. More data for this specific version will follow. Cc: Baolin Wang Cc: David Hildenbrand Cc: Johannes Weiner Cc: Matthew Wilcox (Oracle) Cc: Oscar Salvador Cc: Ryan Roberts Cc: Zi Yan Cc: Lorenzo Stoakes Cc: "Liam R. Howlett" Cc: Vlastimil Babka Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Michal Hocko Signed-off-by: Barry Song --- -v2: * add prctl as suggested by Johannes and David * demote exclusive file folios if drop_recency can apply -v1: https://lore.kernel.org/linux-mm/20250412085852.48524-1-21cnbao@gmail.com/ include/linux/mm_types.h | 1 + include/uapi/linux/prctl.h | 3 +++ kernel/sys.c | 16 ++++++++++++++++ mm/huge_memory.c | 12 ++++++++++-- mm/internal.h | 14 ++++++++++++++ mm/memory.c | 12 +++++++++++- 6 files changed, 55 insertions(+), 3 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 15808cad2bc1..84ab113c54a2 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1733,6 +1733,7 @@ enum { * on NFS restore */ //#define MMF_EXE_FILE_CHANGED 18 /* see prctl_set_mm_exe_file() */ +#define MMF_FADE_ON_DEATH 18 /* Recency is discarded on process exit */ #define MMF_HAS_UPROBES 19 /* has uprobes */ #define MMF_RECALC_UPROBES 20 /* MMF_HAS_UPROBES can be wrong */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 15c18ef4eb11..22d861157552 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -364,4 +364,7 @@ struct prctl_mm_map { # define PR_TIMER_CREATE_RESTORE_IDS_ON 1 # define PR_TIMER_CREATE_RESTORE_IDS_GET 2 +#define PR_SET_FADE_ON_DEATH 78 +#define PR_GET_FADE_ON_DEATH 79 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index c434968e9f5d..cabe1bbb35a4 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2658,6 +2658,22 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, clear_bit(MMF_DISABLE_THP, &me->mm->flags); mmap_write_unlock(me->mm); break; + case PR_GET_FADE_ON_DEATH: + if (arg2 || arg3 || arg4 || arg5) + return -EINVAL; + error = !!test_bit(MMF_FADE_ON_DEATH, &me->mm->flags); + break; + case PR_SET_FADE_ON_DEATH: + if (arg3 || arg4 || arg5) + return -EINVAL; + if (mmap_write_lock_killable(me->mm)) + return -EINTR; + if (arg2) + set_bit(MMF_FADE_ON_DEATH, &me->mm->flags); + else + clear_bit(MMF_FADE_ON_DEATH, &me->mm->flags); + mmap_write_unlock(me->mm); + break; case PR_MPX_ENABLE_MANAGEMENT: case PR_MPX_DISABLE_MANAGEMENT: /* No longer implemented: */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2780a12b25f0..c99894611d4a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2204,6 +2204,7 @@ static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr) { + bool drop_recency = false; pmd_t orig_pmd; spinlock_t *ptl; @@ -2260,13 +2261,20 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, add_mm_counter(tlb->mm, mm_counter_file(folio), -HPAGE_PMD_NR); + drop_recency = zap_need_to_drop_recency(tlb->mm); /* * Use flush_needed to indicate whether the PMD entry * is present, instead of checking pmd_present() again. */ - if (flush_needed && pmd_young(orig_pmd) && - likely(vma_has_recency(vma))) + if (flush_needed && pmd_young(orig_pmd) && !drop_recency && + likely(vma_has_recency(vma))) folio_mark_accessed(folio); + /* + * Userspace explicitly marks recency to fade when the process + * dies; demote exclusive file folios to aid reclamation. + */ + if (drop_recency && !folio_maybe_mapped_shared(folio)) + deactivate_file_folio(folio); } spin_unlock(ptl); diff --git a/mm/internal.h b/mm/internal.h index 6b8ed2017743..af9649b3e84a 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -130,6 +131,19 @@ static inline int folio_nr_pages_mapped(const struct folio *folio) return atomic_read(&folio->_nr_pages_mapped) & FOLIO_PAGES_MAPPED; } +/* + * Returns true if the process attached to the mm is dying or undergoing + * OOM reaping, and its recency—explicitly marked by userspace—will also + * fade; otherwise, returns false. + */ +static inline bool zap_need_to_drop_recency(struct mm_struct *mm) +{ + if (!atomic_read(&mm->mm_users) || check_stable_address_space(mm)) + return !!test_bit(MMF_FADE_ON_DEATH, &mm->flags); + + return false; +} + /* * Retrieve the first entry of a folio based on a provided entry within the * folio. We cannot rely on folio->swap as there is no guarantee that it has diff --git a/mm/memory.c b/mm/memory.c index 5a7e4c0e89c7..6dd01a7736a8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1505,6 +1505,7 @@ static __always_inline void zap_present_folio_ptes(struct mmu_gather *tlb, bool *force_flush, bool *force_break, bool *any_skipped) { struct mm_struct *mm = tlb->mm; + bool drop_recency = false; bool delay_rmap = false; if (!folio_test_anon(folio)) { @@ -1516,9 +1517,18 @@ static __always_inline void zap_present_folio_ptes(struct mmu_gather *tlb, *force_flush = true; } } - if (pte_young(ptent) && likely(vma_has_recency(vma))) + + drop_recency = zap_need_to_drop_recency(mm); + if (pte_young(ptent) && !drop_recency && + likely(vma_has_recency(vma))) folio_mark_accessed(folio); rss[mm_counter(folio)] -= nr; + /* + * Userspace explicitly marks recency to fade when the process dies; + * demote exclusive file folios to aid reclamation. + */ + if (drop_recency && !folio_maybe_mapped_shared(folio)) + deactivate_file_folio(folio); } else { /* We don't need up-to-date accessed/dirty bits. */ clear_full_ptes(mm, addr, pte, nr, tlb->fullmm); -- 2.39.3 (Apple Git-146)