From: zhiguojiang <justinjiang@vivo.com>
To: David Hildenbrand <david@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Will Deacon <will@kernel.org>,
"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
Nick Piggin <npiggin@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Arnd Bergmann <arnd@arndb.de>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
linux-arch@vger.kernel.org, cgroups@vger.kernel.org,
Barry Song <21cnbao@gmail.com>
Cc: opensource.kernel@vivo.com
Subject: Re: [PATCH v3 2/2] mm: tlb: add tlb swap entries batch async release
Date: Mon, 9 Sep 2024 22:43:58 +0800 [thread overview]
Message-ID: <d8445378-4eb2-4d5c-b3b8-1e1a5a3b1458@vivo.com> (raw)
In-Reply-To: <f58950cd-dbe3-4629-ac92-30c76db7849a@redhat.com>
在 2024/9/9 14:49, David Hildenbrand 写道:
> On 05.08.24 17:36, Zhiguo Jiang wrote:
>> One of the main reasons for the prolonged exit of the process with
>> independent mm is the time-consuming release of its swap entries.
>> The proportion of swap memory occupied by the process increases over
>> time due to high memory pressure triggering to reclaim anonymous folio
>> into swapspace, e.g., in Android devices, we found this proportion can
>> reach 60% or more after a period of time. Additionally, the relatively
>> lengthy path for releasing swap entries further contributes to the
>> longer time required to release swap entries.
>>
>> Testing Platform: 8GB RAM
>> Testing procedure:
>> After booting up, start 15 processes first, and then observe the
>> physical memory size occupied by the last launched process at different
>> time points.
>> Example: The process launched last: com.qiyi.video
>> | memory type | 0min | 1min | 5min | 10min | 15min |
>> -------------------------------------------------------------------
>> | VmRSS(KB) | 453832 | 252300 | 204364 | 199944 | 199748 |
>> | RssAnon(KB) | 247348 | 99296 | 71268 | 67808 | 67660 |
>> | RssFile(KB) | 205536 | 152020 | 132144 | 131184 | 131136 |
>> | RssShmem(KB) | 1048 | 984 | 952 | 952 | 952 |
>> | VmSwap(KB) | 202692 | 334852 | 362880 | 366340 | 366488 |
>> | Swap ratio(%) | 30.87% | 57.03% | 63.97% | 64.69% | 64.72% |
>> Note: min - minute.
>>
>> When there are multiple processes with independent mm and the high
>> memory pressure in system, if the large memory required process is
>> launched at this time, system will is likely to trigger the
>> instantaneous
>> killing of many processes with independent mm. Due to multiple exiting
>> processes occupying multiple CPU core resources for concurrent
>> execution,
>> leading to some issues such as the current non-exiting and important
>> processes lagging.
>>
>> To solve this problem, we have introduced the multiple exiting process
>> asynchronous swap entries release mechanism, which isolates and caches
>> swap entries occupied by multiple exiting processes, and hands them over
>> to an asynchronous kworker to complete the release. This allows the
>> exiting processes to complete quickly and release CPU resources. We have
>> validated this modification on the Android products and achieved the
>> expected benefits.
>>
>> Testing Platform: 8GB RAM
>> Testing procedure:
>> After restarting the machine, start 15 app processes first, and then
>> start the camera app processes, we monitor the cold start and preview
>> time datas of the camera app processes.
>>
>> Test datas of camera processes cold start time (unit: millisecond):
>> | seq | 1 | 2 | 3 | 4 | 5 | 6 | average |
>> | before | 1498 | 1476 | 1741 | 1337 | 1367 | 1655 | 1512 |
>> | after | 1396 | 1107 | 1136 | 1178 | 1071 | 1339 | 1204 |
>>
>> Test datas of camera processes preview time (unit: millisecond):
>> | seq | 1 | 2 | 3 | 4 | 5 | 6 | average |
>> | before | 267 | 402 | 504 | 513 | 161 | 265 | 352 |
>> | after | 188 | 223 | 301 | 203 | 162 | 154 | 205 |
>>
>> Base on the average of the six sets of test datas above, we can see that
>> the benefit datas of the modified patch:
>> 1. The cold start time of camera app processes has reduced by about 20%.
>> 2. The preview time of camera app processes has reduced by about 42%.
>>
>> It offers several benefits:
>> 1. Alleviate the high system cpu loading caused by multiple exiting
>> processes running simultaneously.
>> 2. Reduce lock competition in swap entry free path by an asynchronous
>> kworker instead of multiple exiting processes parallel execution.
>> 3. Release pte_present memory occupied by exiting processes more
>> efficiently.
>>
>> Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
>> ---
>> arch/s390/include/asm/tlb.h | 8 +
>> include/asm-generic/tlb.h | 44 ++++++
>> include/linux/mm_types.h | 58 +++++++
>> mm/memory.c | 3 +-
>> mm/mmu_gather.c | 296 ++++++++++++++++++++++++++++++++++++
>> 5 files changed, 408 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
>> index e95b2c8081eb..3f681f63390f
>> --- a/arch/s390/include/asm/tlb.h
>> +++ b/arch/s390/include/asm/tlb.h
>> @@ -28,6 +28,8 @@ static inline bool __tlb_remove_page_size(struct
>> mmu_gather *tlb,
>> struct page *page, bool delay_rmap, int page_size);
>> static inline bool __tlb_remove_folio_pages(struct mmu_gather *tlb,
>> struct page *page, unsigned int nr_pages, bool delay_rmap);
>> +static inline bool __tlb_remove_swap_entries(struct mmu_gather *tlb,
>> + swp_entry_t entry, int nr);
>
>
> The problem I am having is that swap entries don't have any
> intersection with the TLB. It sounds like we're squeezing something
> into an existing concept (MMU gather) that just doesn't belong in there.
I referred to the mechanism of batch release in tlb, and perhaps a new
structure needs to be created to implement this feature.
Thanks
Zhiguo
next prev parent reply other threads:[~2024-09-09 14:44 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-05 15:36 [PATCH v3 0/2] mm: " Zhiguo Jiang
2024-08-05 15:36 ` [PATCH v3 1/2] mm: move task_is_dying to h headfile Zhiguo Jiang
2024-08-05 15:36 ` [PATCH v3 2/2] mm: tlb: add tlb swap entries batch async release Zhiguo Jiang
2024-09-04 9:16 ` Barry Song
2024-09-04 11:26 ` zhiguojiang
2024-09-04 11:38 ` zhiguojiang
2024-09-09 1:59 ` Barry Song
2024-09-09 14:39 ` zhiguojiang
2024-09-10 4:18 ` Barry Song
2024-09-10 9:22 ` zhiguojiang
2024-09-10 10:11 ` Barry Song
2024-09-12 7:30 ` zhiguojiang
2024-09-09 6:49 ` David Hildenbrand
2024-09-09 14:43 ` zhiguojiang [this message]
2024-09-10 4:22 ` Barry Song
2024-09-10 8:52 ` David Hildenbrand
2024-09-10 9:27 ` zhiguojiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d8445378-4eb2-4d5c-b3b8-1e1a5a3b1458@vivo.com \
--to=justinjiang@vivo.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@kernel.org \
--cc=arnd@arndb.de \
--cc=cgroups@vger.kernel.org \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=npiggin@gmail.com \
--cc=opensource.kernel@vivo.com \
--cc=peterz@infradead.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox