Re: [PATCH v0 0/2] mm: swap: Gather swap entries and batch async release

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Lei Liu <liulei.rjpt@vivo.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>,
	David Rientjes <rientjes@google.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
	Barry Song <baohua@kernel.org>, Chris Li <chrisl@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	David Hildenbrand <david@redhat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Brendan Jackman <jackmanb@google.com>, Zi Yan <ziy@nvidia.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Chen Yu <yu.c.chen@intel.com>, Hao Jia <jiahao1@lixiang.com>,
	"Kirill A. Shutemov" <kas@kernel.org>,
	Usama Arif <usamaarif642@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Christian Brauner <brauner@kernel.org>,
	Mateusz Guzik <mjguzik@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Fushuai Wang <wangfushuai@baidu.com>,
	"open list:MEMORY MANAGEMENT - OOM KILLER" <linux-mm@kvack.org>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)"
	<cgroups@vger.kernel.org>
Subject: Re: [PATCH v0 0/2] mm: swap: Gather swap entries and batch async release
Date: Wed, 10 Sep 2025 22:01:35 +0800	[thread overview]
Message-ID: <eee7d740-cf71-40d3-a037-543ae28c187a@vivo.com> (raw)
In-Reply-To: <CAMgjq7Ca6zOozixPot3j5FP_6A8h=DFc7yjHKp2Lg+qu7gNwMA@mail.gmail.com>


On 2025/9/9 15:30, Kairui Song wrote:
> [You don't often get email from ryncsn@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Tue, Sep 9, 2025 at 3:04 PM Lei Liu <liulei.rjpt@vivo.com> wrote:
> Hi Lei,
>
>> 1. Problem Scenario
>> On systems with ZRAM and swap enabled, simultaneous process exits create
>> contention. The primary bottleneck occurs during swap entry release
>> operations, causing exiting processes to monopolize CPU resources. This
>> leads to scheduling delays for high-priority processes.
>>
>> 2. Android Use Case
>> During camera launch, LMKD terminates background processes to free memory.
>> Exiting processes compete for CPU cycles, delaying the camera preview
>> thread and causing visible stuttering - directly impacting user
>> experience.
>>
>> 3. Root Cause Analysis
>> When background applications heavily utilize swap space, process exit
>> profiling reveals 55% of time spent in free_swap_and_cache_nr():
>>
>> Function              Duration (ms)   Percentage
>> do_signal               791.813     **********100%
>> do_group_exit           791.813     **********100%
>> do_exit                 791.813     **********100%
>> exit_mm                 577.859        *******73%
>> exit_mmap               577.497        *******73%
>> zap_pte_range           558.645        *******71%
>> free_swap_and_cache_nr  433.381          *****55%
>> free_swap_slot          403.568          *****51%
> Thanks for sharing this case.
>
> One problem is that now the free_swap_slot function no longer exists
> after 0ff67f990bd4. Have you tested the latest kernel? Or what is the
> actual overhead here?
>
> Some batch freeing optimizations are introduced. And we have reworked
> the whole locking mechanism for swap, so even on a system with 96t the
> contention seems barely observable with common workloads.
>
> And another series is further reducing the contention and the overall
> overhead (24% faster freeing for phase 1):
> https://lore.kernel.org/linux-mm/20250905191357.78298-1-ryncsn@gmail.com/
>
> Will these be helpful for you? I think optimizing the root problem is
> better than just deferring the overhead with async workers, which may
> increase the overall overhead and complexity.

Hi Kairui

Thank you for your optimization suggestions. We believe your patch may 
help ou
r scenario. We'll try integrating it to evaluate benefits. However, it 
may not
fully solve our issue. Below is our problem description:

Flame graph of time distribution for TikTok process exit (~400MB swapped):
do_notify_resume         3.89%
get_signal               3.89%
do_signal_exit           3.88%
do_exit                  3.88%
mmput                    3.22%
exit_mmap                3.22%
unmap_vmas               3.08%
unmap_page_range         3.07%
free_swap_and_cache_nr   1.31%****
swap_entry_range_free    1.17%****
zram_slot_free_notify    1.11%****
zram_free_hw_entry_dc    0.43%
free_zspage[zsmalloc]    0.09%

CPU: 8-core ARM64 (14.21GHz+33.5GHz+4*2.7GHz), 12GB RAM

Process with ~400MB swap exit situation:
Exit takes 200-300ms, ~4% CPU load
With more zram compression/swap, exit time increases to 400-500ms
free_swap_and_cache_nr avg: 0.5ms, max: ~1.5ms (running time)
free_swap_and_cache_nr dominates exit time (33%, up to 50% in worst cases
). Main time is zram resource freeing (0.25ms per operation). With dozens
of simultaneous exits, cumulative time becomes significant.

Optimization approach:
Focus isn't on optimizing hot functions (limited improvement potential).
High load comes from too many simultaneous exits. We'll make time-consumin
g interfaces in do_exit asynchronous to accelerate exit completion while
allowing non-swap page (file/anonymous) freeing by other processes.

Camera startup scenario:
20-30 background apps, anonymous pages compressed to zram (200-500MB).
Camera launch triggers lmkd to kill 10+ apps - their exits consume 25%+
CPU. System services/third-party processes use 60%+ CPU, leaving camera
startup process CPU-starved and delayed.


Sincere wishes,
Lei


>
>
>> swap_entry_free         393.863          *****50%
>> swap_range_free         372.602           ****47%
>>
>> 4. Optimization Approach
>> a) For processes exceeding swap entry threshold: aggregate and isolate
>> swap entries to enable fast exit
>> b) Asynchronously release batched entries when isolation reaches
>> configured threshold
>>
>> 5. Performance Gains (User Scenario: Camera Cold Launch)
>> a) 74% reduction in process exit latency (>500ms cases)
>> b) ~4% lower peak CPU load during concurrent process exits
>> c) ~70MB additional free memory during camera preview initialization
>> d) 40% reduction in camera preview stuttering probability
>>
>> 6. Prior Art & Improvements
>> Reference: Zhiguo Jiang's patch
>> (https://lore.kernel.org/all/20240805153639.1057-1-justinjiang@vivo.com/)
>>
>> Key enhancements:
>> a) Reimplemented logic moved from mmu_gather.c to swapfile.c for clarity
>> b) Async release delegated to workqueue kworkers with configurable
>> max_active for NUMA-optimized concurrency
>>
>> Lei Liu (2):
>>    mm: swap: Gather swap entries and batch async release core
>>    mm: swap: Forced swap entries release under memory pressure
>>
>>   include/linux/oom.h           |  23 ++++++
>>   include/linux/swapfile.h      |   2 +
>>   include/linux/vm_event_item.h |   1 +
>>   kernel/exit.c                 |   2 +
>>   mm/memcontrol.c               |   6 --
>>   mm/memory.c                   |   4 +-
>>   mm/page_alloc.c               |   4 +
>>   mm/swapfile.c                 | 134 ++++++++++++++++++++++++++++++++++
>>   mm/vmstat.c                   |   1 +
>>   9 files changed, 170 insertions(+), 7 deletions(-)
>>
>> --
>> 2.34.1
>>
>>

next prev parent reply	other threads:[~2025-09-10 14:02 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-09  6:53 Lei Liu
2025-09-09  6:53 ` [PATCH v0 1/2] mm: swap: Gather swap entries and batch async release core Lei Liu
2025-09-10  1:39   ` kernel test robot
2025-09-10  3:12   ` kernel test robot
2025-09-09  6:53 ` [PATCH v0 2/2] mm: swap: Forced swap entries release under memory pressure Lei Liu
2025-09-10  5:36   ` kernel test robot
2025-09-09  7:30 ` [PATCH v0 0/2] mm: swap: Gather swap entries and batch async release Kairui Song
2025-09-09  9:24   ` Barry Song
2025-09-09 16:15     ` Chris Li
2025-09-09 18:01       ` Chris Li
2025-09-10 14:07     ` Lei Liu
2025-10-14 20:42       ` Barry Song
2025-09-09 15:38   ` Chris Li
2025-09-10 14:01   ` Lei Liu [this message]
2025-09-09 19:21 ` Shakeel Butt
2025-09-09 19:48   ` Suren Baghdasaryan
2025-09-10 14:14     ` Lei Liu
2025-09-10 14:56       ` Suren Baghdasaryan
2025-09-10 16:05       ` Chris Li
2025-09-10 20:12       ` Shakeel Butt
2025-09-11  3:04         ` Lei Liu
2025-09-10 15:40     ` Chris Li
2025-09-10 20:10     ` Shakeel Butt
2025-09-10 20:41       ` Suren Baghdasaryan
2025-09-10 22:10         ` T.J. Mercier
2025-09-10 22:33           ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eee7d740-cf71-40d3-a037-543ae28c187a@vivo.com \
    --to=liulei.rjpt@vivo.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=brauner@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=jiahao1@lixiang.com \
    --cc=kas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=mjguzik@gmail.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wangfushuai@baidu.com \
    --cc=yu.c.chen@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox