From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2B47C3DA4A for ; Thu, 1 Aug 2024 07:37:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7AE7A6B009E; Thu, 1 Aug 2024 03:37:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 75DF46B00A0; Thu, 1 Aug 2024 03:37:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 625986B00A2; Thu, 1 Aug 2024 03:37:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4322A6B009E for ; Thu, 1 Aug 2024 03:37:09 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id CA825160585 for ; Thu, 1 Aug 2024 07:37:08 +0000 (UTC) X-FDA: 82402870536.07.5FA6E0D Received: from mail-ua1-f54.google.com (mail-ua1-f54.google.com [209.85.222.54]) by imf18.hostedemail.com (Postfix) with ESMTP id 0A4AA1C000B for ; Thu, 1 Aug 2024 07:37:05 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mALHQNLd; spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722497769; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sfuN7vtcTfjCBA2br92dqKx97adSDLYFKLRcwy0cyB8=; b=Am8RW9S4pH3H91BFm9VdlJ7DxmUZAHr83i9MzXHw9u+tKrLery9apirpj30ImRyicZ5TrH FQWVaSPvpgXLFYWQYEjHGhB6IRlXiSgPrxDy2vrJGjI2lr17SlUmTiabu964zye2/B1+rR mKxLtb6ebvUkx1CdJWaeRiXsnTEXFE4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722497769; a=rsa-sha256; cv=none; b=ag+t4DNVXXNOxusKje4Ya9eFo4dCOvPUrwt6LPgJsczm2TSpDkcTkNU1n4T6Qrj1kAuRZd sUz5EssPC151GaAAr4P8Yq5ZXnVJrq5gwCKUSg5sZSN5xQxpg+Da7noejj3hjf7vSXhkBQ bk/WIXGxbZCLoTtter+whe556SCoFDE= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mALHQNLd; spf=pass (imf18.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ua1-f54.google.com with SMTP id a1e0cc1a2514c-81ff6a80cbbso1988392241.1 for ; Thu, 01 Aug 2024 00:37:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722497825; x=1723102625; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=sfuN7vtcTfjCBA2br92dqKx97adSDLYFKLRcwy0cyB8=; b=mALHQNLdvD/Wz1eY/A2HoOt/9tPpEIqSnzXdyFO6KqKgfKOiqtdK6nTS+m6ORQ89JA dcKiTWRYCIZmE+8YtfT8PwTd5YcNxKvkymnBoc/YnT7HBuvSCUkGQTk4QQmYFwRnhT/7 S006pSGtlmoXzRgtV5p/cvdCjm5zhnHoKLYHHvzy8F5TAKM3X+9UN5Jwb8gRpcTawjWI H6t7MdYUVbky4iUu9fF0zlgAxWAg5T3cs5RqaWy816iVm3PKA6aMjPldNeR1MqLMts33 hDunArk1rIx3AbfCy1n/TGg6DepkAIVKIg64H8+deJUORMAF5rgkRNP2FCRyYgSFVfgS EKDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722497825; x=1723102625; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sfuN7vtcTfjCBA2br92dqKx97adSDLYFKLRcwy0cyB8=; b=KPHVpOngi7T0OMsNGm5eeKqTf3GyQmRMWO8THnw0AtKFxIzyBjrhVH5BhN6z9AUpSU XI8PB204OQv97yi/k6ph4OHFEhArzRKvPhZp8opgrnwVfQQEYPjjOpWW1fMP9FHqUgz/ 9m9Fz/pFkB3QQiGXEmzY4wxdc+4SO9m3CtG9jdVIqygextpFi0Fa0DoQORQmOHaRH3Rc m8M2rPkcRq9jHTIApW90jmUy5XePRGHAYKuxqoXeeb9jD+jT2/+RF26BhYnp+Q7tEiEd GfQlds64snP4eOted+/pEyvkzhGZKiwy58qxJa7aSOgDbFUIsg5dKTmM6WZrPUyidQNe nPEQ== X-Forwarded-Encrypted: i=1; AJvYcCVHXpPZc96LUY4q8BD9Gkyv2HfqrOEWrhwYBqjlAYSVeB5RIiaIgvq52MdZdenbPxRE+T/eupPjSw==@kvack.org X-Gm-Message-State: AOJu0YzITjCIePBYMTgV4Ifcx8HdfqKZFmiuztEiw8HsJ85MNVXvUapI b1B5IAxuy81ym1bp1wiIOIoc4eJhQkehBFaM34fkC4qDrTLV0Kp4Af5+weRGYiDTiCPzhyrYplp vdYuM+RPlVPuHFpXzEDZf3YS69tM= X-Google-Smtp-Source: AGHT+IGx04si4V8GDZUMjmPGLAJNMLD/FLSH9iT4JiQeGnLtSZxzfZoLY2gFZ+CbFpGugwviVfBE/7dkB9wixnRK3ZY= X-Received: by 2002:a05:6102:3f51:b0:493:bf46:7f00 with SMTP id ada2fe7eead31-4945069ca67mr2432377137.5.1722497824850; Thu, 01 Aug 2024 00:37:04 -0700 (PDT) MIME-Version: 1.0 References: <20240731133318.527-1-justinjiang@vivo.com> <20240731091715.b78969467c002fa3a120e034@linux-foundation.org> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Thu, 1 Aug 2024 15:36:52 +0800 Message-ID: Subject: Re: [PATCH v2 0/3] mm: tlb swap entries batch async release To: zhiguojiang Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Will Deacon , "Aneesh Kumar K.V" , Nick Piggin , Peter Zijlstra , Arnd Bergmann , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , linux-arch@vger.kernel.org, cgroups@vger.kernel.org, kernel test robot , opensource.kernel@vivo.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 0A4AA1C000B X-Stat-Signature: eud44rfdbhkhfhhmgpwhadj3g9x5g8w4 X-HE-Tag: 1722497825-845736 X-HE-Meta: U2FsdGVkX1/sbxVnY+WL2VknfrhT9QtnSyOzOfVR48xF0KEjyXsn6/hSOchzcdu2QPDHsUnA9t9pXj3bHvIBgeMJQA52q1Qsx68PPagwa7cnBz62sOzrdc6zJhgY6P3Crbfo62lDyij8fPisVPQ8phpMXHREVT6chxaf0OLPKV9SR89DUUdfwqC/DwkAvZGokCchIcSHM0kIZENHKCD0K+I07WuZAoPQ1IcmjiA6ED64ye0aV3FxSrq5J7zv6k9gmhrE6i+3/pRPTCkYma63+YLtQ5rKNG48GiPIs7DcVyhmaW4hk8jNBcW5LpRMYUqyR7tCWhZ5REKcE9owAKt3LErHfHqU9MXdZsPN97hqztRDVMO4tusBbBTA1gegUgBbHYEjm49kG6ZnGVDj+7Nj4H2VbtbNSquSMPXQ3COjAa/vw8AGPVPfJOivfYGpHFl5Wtv+WRrTMAgfTuvVm3UX3w4jYwhSlL5Ko1uFB1t5JqmFWYciH8KVMTg+f1+RYehi+FvsTxolf2gVqszmQ+5E0von+ZxZM+ckIWim6s9dFH/acx2lKD442ivF5IWS3g/MN/EIYHqifh+7gG81zxqywFkY6gfqkd1v6Ed9ulCMxU4MIJLaSDFFcR9KHNl8z90L475ralwJvO7sXI6mucBfEJBi0uwhpWnZecsp7UEpMxV/LHCaLIfR/65qFvR6Q9w79+XthqQb+hafsxRQxd+RA6WrxfNQHv/YcJ2bfXZx1R4lA6XtQBb4N85p3muwaUjBnYXW0wryU63O497zj9/7a0Y6t8KlrJI5zcSrwE2wKhaLydzwPVVrZyB8aTsmdu2cDZuvvEEMmoeyzJJjTKJutY+4bmj8bcNiITBikge66g7no3LcHALqyiYRp2epAlQLyoRfC6nTitdXiU8B2pWi63j7CFZi8yQKH1kikzQTLo6e8iywFNRP/5kIcos0BU5pRvrX/fkNQ/mZj6CeBCK hH7Q8KJn QB1SBnbr52TY/PFL5/zjm+rdqYBRkaLVWx0Gff1MnO7o8Mh0VXLUTu/qTcjkb/6CkHYfLrhOZskvShCv8Xo651k1fLRNU37lDUXuHtMEQBHaxnON462/XKIPIdPiaPCFW2CmZaT54AwPMMSfBcUA2pzoukMUCpxGM2p1nmTXHFV2rWk6dK+Ja06sgOJaGF9u9B8/NT6bABUkZznYdQyID6BKsfqG48Bij6qTCQukIRahnlMwcIp17EdLGxfrK8056SrBM+wEMcBdxANZhr/qpKpJp/uk0FSL3/a6mC30SP5NhbbPs+rXVzTm0jCjaiTHwbxYGnIqNk1ktagPGhSLRANqD1oOR0KW8Jjn25kW59ONNmMHz0xQ781gi2D9vo0shzmwt2Dt5S207vuPEV5gGtVFCwktaxCngG77YIU4TsuKvp2U4+NTMuKpMEVF0Vgt24aiuv1jx86++216TpNbgb8JK4OOhw32Wl8NXrZM0DhIe1WCQi2Da/RLWUw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 1, 2024 at 2:31=E2=80=AFPM zhiguojiang w= rote: > > > > =E5=9C=A8 2024/8/1 0:17, Andrew Morton =E5=86=99=E9=81=93: > > [Some people who received this message don't often get email from akpm@= linux-foundation.org. Learn why this is important at https://aka.ms/LearnAb= outSenderIdentification ] > > > > On Wed, 31 Jul 2024 21:33:14 +0800 Zhiguo Jiang = wrote: > > > >> The main reasons for the prolonged exit of a background process is the > > The kernel really doesn't have a concept of a "background process". > > It's a userspace concept - perhaps "the parent process isn't waiting on > > this process via wait()". > > > > I assume here you're referring to an Android userspace concept? I > > expect that when Android "backgrounds" a process, it does lots of > > things to that process. Perhaps scheduling priority, perhaps > > alteration of various MM tunables, etc. > > > > So rather than referring to "backgrounding" it would be better to > > identify what tuning alterations are made to such processes to bring > > about this behavior. > Hi Andrew Morton, > > Thank you for your review and comments. > > You are right. The "background process" here refers to the process > corresponding to an Android application switched to the background. > In fact, this patch is applicable to any exiting process. > > The further explaination the concept of "multiple exiting processes", > is that it refers to different processes owning independent mm rather > than sharing the same mm. > > I will use "mm" to describe process instead of "background" in next > version. > > > >> time-consuming release of its swap entries. The proportion of swap mem= ory > >> occupied by the background process increases with its duration in the > >> background, and after a period of time, this value can reach 60% or mo= re. > > Again, what is it about the tuning of such processes which causes this > > behavior? > When system is low memory, memory recycling will be trigged, where > anonymous folios in the process will be continuously reclaimed, resulting > in an increase of swap entries occupies by this process. So when the > process is killed, it takes more time to release it's swap entries over > time. > > Testing datas of process occuping different physical memory sizes at > different time points: > Testing Platform: 8GB RAM > Testing procedure: > After booting up, start 15 processes first, and then observe the > physical memory size occupied by the last launched process at > different time points. > > Example: > The process launched last: com.qiyi.video > | memory type | 0min | 1min | BG 5min | BG 10min | BG 15min | > ------------------------------------------------------------------- > | VmRSS(KB) | 453832 | 252300 | 204364 | 199944 | 199748 | > | RssAnon(KB) | 247348 | 99296 | 71268 | 67808 | 67660 | > | RssFile(KB) | 205536 | 152020 | 132144 | 131184 | 131136 | > | RssShmem(KB) | 1048 | 984 | 952 | 952 | 952 | > | VmSwap(KB) | 202692 | 334852 | 362880 | 366340 | 366488 | > | Swap ratio(%) | 30.87% | 57.03% | 63.97% | 64.69% | 64.72% | > min - minute. > > Based on the above datas, we can know that the swap ratio occupied by > the process gradually increases over time. If I understand correctly, during zap_pte_range(), if 64.72% of the anonymo= us pages are actually swapped out, you end up zapping 100 PTEs but only freein= g 36.28 pages of memory. By doing this asynchronously, you prevent the swap_release operation from blocking the process of zapping normal PTEs that are mapping to memory. Could you provide data showing the improvements after implementing asynchronous freeing of swap entries? > > > >> Additionally, the relatively lengthy path for releasing swap entries > >> further contributes to the longer time required for the background pro= cess > >> to release its swap entries. > >> > >> In the multiple background applications scenario, when launching a lar= ge > >> memory application such as a camera, system may enter a low memory sta= te, > >> which will triggers the killing of multiple background processes at th= e > >> same time. Due to multiple exiting processes occupying multiple CPUs f= or > >> concurrent execution, the current foreground application's CPU resourc= es > >> are tight and may cause issues such as lagging. > >> > >> To solve this problem, we have introduced the multiple exiting process > >> asynchronous swap memory release mechanism, which isolates and caches > >> swap entries occupied by multiple exit processes, and hands them over > >> to an asynchronous kworker to complete the release. This allows the > >> exiting processes to complete quickly and release CPU resources. We ha= ve > >> validated this modification on the products and achieved the expected > >> benefits. > > Dumb question: why can't this be done in userspace? The exiting > > process does fork/exit and lets the child do all this asynchronous free= ing? > The logic optimization for kernel releasing swap entries cannot be > implemented in userspace. The multiple exiting processes here own > their independent mm, rather than parent and child processes share the > same mm. Therefore, when the kernel executes multiple exiting process > simultaneously, they will definitely occupy multiple CPU core resources > to complete it. > >> It offers several benefits: > >> 1. Alleviate the high system cpu load caused by multiple exiting > >> processes running simultaneously. > >> 2. Reduce lock competition in swap entry free path by an asynchronous > >> kworker instead of multiple exiting processes parallel execution. > > Why is lock contention reduced? The same amount of work needs to be > > done. > When multiple CPU cores run to release the different swap entries belong > to different exiting processes simultaneously, cluster lock or swapinfo > lock may encounter lock contention issues, and while an asynchronous > kworker that only occupies one CPU core is used to complete this work, > it can reduce the probability of lock contention and free up the > remaining CPU core resources for other non-exiting processes to use. > > > >> 3. Release memory occupied by exiting processes more efficiently. > > Probably it's slightly less efficient. > We observed that using an asynchronous kworker can result in more free > memory earlier. When multiple processes exit simultaneously, due to CPU > core resources competition, these exiting processes remain in a > runnable state for a long time and cannot release their occupied memory > resources timely. > > > > There are potential problems with this approach of passing work to a > > kernel thread: > > > > - The process will exit while its resources are still allocated. But > > its parent process assumes those resources are now all freed and the > > parent process then proceeds to allocate resources. This results in > > a time period where peak resource consumption is higher than it was > > before such a change. > - I don't think this modification will cause such a problem. Perhaps I > haven't fully understood your meaning yet. Can you give me a specific > example? Normally, after completing zap_pte_range, your swap slots are returned to the swap file, except for a few slot caches. However, with the asynchronous approach, it means that even after your process has completely exited, some swap slots might still not be released to the system. This could potentially starve other processes waiting for swap slots to perform swap-outs. I assume this isn't a critical issue for you because, in the case of killing processes, freeing up memory is more important than releasing swap entries? > > - If all CPUs are running in userspace with realtime policy > > (SCHED_FIFO, for example) then the kworker thread will not run, > > indefinitely. > - In my clumsy understanding, the execution priority of kernel threads > should not be lower than that of the exiting process, and the > asynchronous kworker execution should only be triggered when the > process exits. The exiting process should not be set to SCHED_LFO, > so when the exiting process is executed, the asynchronous kworker > should also have opportunity to get timely execution. > > - Work which should have been accounted to the exiting process will > > instead go unaccounted. > - You are right, the statistics of process exit time may no longer be > complete. > > So please fully address all these potential issues. > Thanks > Zhiguo > Thanks Barry