From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 661E1C3DA49 for ; Fri, 2 Aug 2024 10:42:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD39A6B007B; Fri, 2 Aug 2024 06:42:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D82E46B0083; Fri, 2 Aug 2024 06:42:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C23786B0089; Fri, 2 Aug 2024 06:42:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9B7C66B007B for ; Fri, 2 Aug 2024 06:42:39 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 1EB8EA7CF3 for ; Fri, 2 Aug 2024 10:42:39 +0000 (UTC) X-FDA: 82406966838.05.885D267 Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) by imf02.hostedemail.com (Postfix) with ESMTP id 57A1680018 for ; Fri, 2 Aug 2024 10:42:37 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Ommh11L2; spf=pass (imf02.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722595299; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tYzWrQLSRsB6tMp797adyM6wE4ytay2lUb6AMeydM08=; b=eo9ODoAY+wnINvUM9CfrBDFayS+l0ir9y/H93ivVxk82tLw06Gf7VH75PZJNqt03DzkAJm MWPmOWFd12rUZEbrYc7CG0wF6OQxaKK+L5cV6c3spFzJKHOlpmbQGgtoSXN7yzRCJqko3u rK2PvBJE7XovQHO83Tzfm+pMF6C19qU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722595299; a=rsa-sha256; cv=none; b=0L96Ac/21jyOABhnm6LcCsvzJ1B412MeYj76RdmjsxDX9osrGoTc0rle+soXKHnbiyFvVf nXQx8zUdJkS9NoiV8WxagQO6ZtTmSz+aM3TALb20msLVOTJELTCpfcPh+6Q0HvBPt6QRmr TukvZw7CAc8IwVAk4ib+xmtdBKKWjX4= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Ommh11L2; spf=pass (imf02.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vs1-f45.google.com with SMTP id ada2fe7eead31-492a3fe7e72so2310387137.1 for ; Fri, 02 Aug 2024 03:42:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722595356; x=1723200156; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tYzWrQLSRsB6tMp797adyM6wE4ytay2lUb6AMeydM08=; b=Ommh11L2dvsnqrOEzWKouY9WFSXyjHDCivCGqGP4SRx1LG2iD/6b8B5Wale+D3L2ic nfg9tDHFXiFQdgRxCy4KyyTdreEVT/FsSlOX+2mnDSw8mPZXfxZg4mlAU202G4+421P8 itjLTrpEo9JFkNfn8siXJfAALIBmYm9Q+Aw65BST42D2iWCJgSYPl/CRWuWZGmTyaapY d55jaz0un/WVe1UqbqnpucqNFgbdZR1hhEuoFeG3zfQ1PMLCdK65lpShXNsoeuEAIm8X Sv5rCgnmwEU+hE1atl3IThX6ccURKk3Y9VaQJhejLbIJympoYzE4NJJWEvXlk9rszSed stwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722595356; x=1723200156; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tYzWrQLSRsB6tMp797adyM6wE4ytay2lUb6AMeydM08=; b=dq5gpDTYWUEGoroHW/2hQqQLTy7Ddr5Kn/WseiP+yQFo+ECjjjorCcfviGVeIMJ1N9 SkFSQrUJhIhgEB5USVH+Tu6/cnwfEP7wU9S8o+gsG4WJ+eBRJGwaEn6s+dr4YnGUluGy 1wH/MitYfiGUPsutmJhi8m8DAEZuxuSC4M2jlokOK7dB6WJWbT4WS8IEB0/fvv9bIqtz QcwngyKlNs1KKccK0clbSud5r6De0wQKPaCA1dzDxw9SrjC0AeKoax+JnCw4cdRlhqGU y7+EyXh9jAOatzNzTqbaftuvMsPO2303DlhCDqayMVAQyqWwiacp0qp59wLiotFqIb8Q +IHQ== X-Forwarded-Encrypted: i=1; AJvYcCUIoJ9bw9nRFPf0IEyoqeKGdQuLF+8lQXHzyxORf45j6x14KbrF9vGmtDwyuHtRrpaACLKtvk+kwBdqJGBW7R8FB+k= X-Gm-Message-State: AOJu0YxCq1KBPyMu+jQp3adbgdaGatYDJDyrPauO+tLD5npIrNIWXx6c EKhEvxTCXtCIw+LgOAJ79s5doMShelKA8zfrqgwQKnuxffrpmr0y7Phy45s9hRyCc1FRZDyiUeB F4haMXByrs/CjCRXXMUWtR6aqc4g= X-Google-Smtp-Source: AGHT+IH+Ty0j8y1rMi3qIdW9D1GBjwFUCgNbj/ybXxuP+yPokUwOwGDsjX0Or/b5B3jpVXuCf2tg7OCJLKU451ZI+X4= X-Received: by 2002:a05:6102:c13:b0:48f:df86:dba with SMTP id ada2fe7eead31-4945bdc47f7mr4371320137.5.1722595356171; Fri, 02 Aug 2024 03:42:36 -0700 (PDT) MIME-Version: 1.0 References: <20240731133318.527-1-justinjiang@vivo.com> <20240731091715.b78969467c002fa3a120e034@linux-foundation.org> <820dfff4-1f09-474e-aa68-30d779a72fed@vivo.com> In-Reply-To: <820dfff4-1f09-474e-aa68-30d779a72fed@vivo.com> From: Barry Song <21cnbao@gmail.com> Date: Fri, 2 Aug 2024 22:42:24 +1200 Message-ID: Subject: Re: [PATCH v2 0/3] mm: tlb swap entries batch async release To: zhiguojiang Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Will Deacon , "Aneesh Kumar K.V" , Nick Piggin , Peter Zijlstra , Arnd Bergmann , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , linux-arch@vger.kernel.org, cgroups@vger.kernel.org, kernel test robot , opensource.kernel@vivo.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 57A1680018 X-Stat-Signature: o84gr6rrizix7rsjkf9mw4tudhkamz4t X-HE-Tag: 1722595357-860271 X-HE-Meta: U2FsdGVkX19XRDTvmNGzODTwFwuXwlPdA3kMTI1WddS5v5YLI4c1/wASmI2QUdTI4nWWPtjKsunDiH68m7ik1HETgWlczsFJ4jARLiK2hqUvEdhkxch6iOrI5ZtAKokSqQ7Ps7P3bcfNmYZ05EMd/dHdf48KEhrisgICixzPkBZVuLvosrvc2TwASYp8X9TPvL0zK9TY4dsEfwN3stp7zB8Dm2X6pK2htRe+aOph7BbkUixWTnU2lW+w44qSXfmqZc2h5D2IDM9FizvAFMD+tdM6Gng0awx5LEW+Dcg1N+N/JyCxwEK39h0yPa4O4NnC3nbGP1GDHRnK7SbzFBePuFNt8o/OROC4dmTyZKutz8bWkpZuCsxJczwq/6SNgUd2mW0jzbuJnSFThLrodYJ3aKjItEq0hcuyaWhXhI1egWnzy18AR6+rSgHhHa5npEEOzog2s1nCpb6MIP/MHeZR9TBud8o6oUQQwDd/IQspUwokf3qTdq+PlAqRui/WrAHHI3OsuI+tbLGFZyCaKOhZr0BX4FD4vReUBVOWETBtiLhbHkEU+4diZyNGmSi4KB1EAKbPw5yB1pziNoJ+8mE/VEC9zs6GLcSsamJ8RM06NMoRaV+gBaoLii4vcpQrCcvmK+gGbSH4+FcX54BAfNfSf4rlp3MPTk+O5NkMxa/YEWKL16mQJwgvsfD+iKMG1Kx4QPRM7wGOgVAwOFYG1GgCXQ1G9wQsSf2RQfDK4w27YIpdhnGj8QhDiDHQW7gFG2lP5pN7jZ2yP3Cd69WgNXLpE4GdzVpX8Lp9g/exEvK0Trrj2bM3pBvMCMHnfLGasGy8xTBP+s+nYSWrFJAOTSYDI+BcmvWc3SyT4jnV8y3xsUpl+/fXvhPB0Ir9CukEncSHoIBwxw1X54anapfY3LlSG3VCyQIgpd0tmGPQmX5wV0Vn+P/RQZZtJOBX0rZ5U/K/5k7+HrADeWLxrnM1IEM Wdf0nzfk U4Q/HuGPBxh3E464UIR10yEisKSLIPeCUhlAC1hxdGs1mWg2Jnt/dFr3HAUr0IF6Q/+sYSg3WZ2id7wS+Sw6H8XEpGz8LaRzz06Wme/v3ZwSpFX1W10/jSCYOwZ4VY4W8TFOzpr/vXyrINFoHiOPtjYBUdfFxi2Xiu30MV4iu4RbIR8DUE0EoavD1/K0ocujihx2KJ7QJbEYuJfqEjFnGRJsaTm1tHIAK8StbZ9Xki/Par8ntmrz01x8zopQzJip0VBWPusVGRiUkxtamB6SNUoeO6BANsdhvaLuqMmpN0EE89PmQp/vnvxXfQa+8ynHJNPLgstkaCdqPIJKBxw5iYW61q1d1WNGslPvLIjNGQPt1+s4sDyKqeYlAKRilIHYzL4q7XGeA8T3VwBxuevSzR00jmKYazPeyXKxupZXvFYIPlWrE6aHhppNlKeDErND7FjJJnkd4LAy9bDdppKSsTZEqv6SL8XXo8PvlxlVJrjiKl92YiapBlWPFww== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 1, 2024 at 10:33=E2=80=AFPM zhiguojiang = wrote: > > > > =E5=9C=A8 2024/8/1 15:36, Barry Song =E5=86=99=E9=81=93: > > On Thu, Aug 1, 2024 at 2:31=E2=80=AFPM zhiguojiang wrote: > >> > >> =E5=9C=A8 2024/8/1 0:17, Andrew Morton =E5=86=99=E9=81=93: > >>> [Some people who received this message don't often get email from akp= m@linux-foundation.org. Learn why this is important at https://aka.ms/Learn= AboutSenderIdentification ] > >>> > >>> On Wed, 31 Jul 2024 21:33:14 +0800 Zhiguo Jiang wrote: > >>> > >>>> The main reasons for the prolonged exit of a background process is t= he > >>> The kernel really doesn't have a concept of a "background process". > >>> It's a userspace concept - perhaps "the parent process isn't waiting = on > >>> this process via wait()". > >>> > >>> I assume here you're referring to an Android userspace concept? I > >>> expect that when Android "backgrounds" a process, it does lots of > >>> things to that process. Perhaps scheduling priority, perhaps > >>> alteration of various MM tunables, etc. > >>> > >>> So rather than referring to "backgrounding" it would be better to > >>> identify what tuning alterations are made to such processes to bring > >>> about this behavior. > >> Hi Andrew Morton, > >> > >> Thank you for your review and comments. > >> > >> You are right. The "background process" here refers to the process > >> corresponding to an Android application switched to the background. > >> In fact, this patch is applicable to any exiting process. > >> > >> The further explaination the concept of "multiple exiting processes", > >> is that it refers to different processes owning independent mm rather > >> than sharing the same mm. > >> > >> I will use "mm" to describe process instead of "background" in next > >> version. > >>>> time-consuming release of its swap entries. The proportion of swap m= emory > >>>> occupied by the background process increases with its duration in th= e > >>>> background, and after a period of time, this value can reach 60% or = more. > >>> Again, what is it about the tuning of such processes which causes thi= s > >>> behavior? > >> When system is low memory, memory recycling will be trigged, where > >> anonymous folios in the process will be continuously reclaimed, result= ing > >> in an increase of swap entries occupies by this process. So when the > >> process is killed, it takes more time to release it's swap entries ove= r > >> time. > >> > >> Testing datas of process occuping different physical memory sizes at > >> different time points: > >> Testing Platform: 8GB RAM > >> Testing procedure: > >> After booting up, start 15 processes first, and then observe the > >> physical memory size occupied by the last launched process at > >> different time points. > >> > >> Example: > >> The process launched last: com.qiyi.video > >> | memory type | 0min | 1min | BG 5min | BG 10min | BG 15min | > >> ------------------------------------------------------------------- > >> | VmRSS(KB) | 453832 | 252300 | 204364 | 199944 | 199748 | > >> | RssAnon(KB) | 247348 | 99296 | 71268 | 67808 | 67660 | > >> | RssFile(KB) | 205536 | 152020 | 132144 | 131184 | 131136 | > >> | RssShmem(KB) | 1048 | 984 | 952 | 952 | 952 | > >> | VmSwap(KB) | 202692 | 334852 | 362880 | 366340 | 366488 | > >> | Swap ratio(%) | 30.87% | 57.03% | 63.97% | 64.69% | 64.72% | > >> min - minute. > >> > >> Based on the above datas, we can know that the swap ratio occupied by > >> the process gradually increases over time. > > If I understand correctly, during zap_pte_range(), if 64.72% of the ano= nymous > > pages are actually swapped out, you end up zapping 100 PTEs but only fr= eeing > > 36.28 pages of memory. By doing this asynchronously, you prevent the > > swap_release operation from blocking the process of zapping normal > > PTEs that are mapping to memory. > > > > Could you provide data showing the improvements after implementing > > asynchronous freeing of swap entries? > Hi Barry, > > Your understanding is correct. From the perspective of the benefits of > releasing the physical memory occupied by the exiting process, an > asynchronous kworker releasing swap entries can indeed accelerate > the exiting process to release its pte_present memory (e.g. file and > anonymous folio) faster. > > In addition, from the perspective of CPU resources, for scenarios where > multiple exiting processes are running simultaneously, an asynchronous > kworker instead of multiple exiting processes is used to release swap > entries can release more CPU core resources for the current non-exiting > and important processes to use, thereby improving the user experience > of the current non-exiting and important processes. I think this is the > main contribution of this modification. > > Example: > When there are multiple processes and the system memory is low, if > the camera processes are started at this time, it will trigger the > instantaneous killing of many processes because the camera processes > need to alloc a large amount of memory, resulting in multiple exiting > processes running simultaneously. These exiting processes will compete > with the current camera processes for CPU resources, and the release of > physical memory occupied by multiple exiting processes due to scheduling > is slow, ultimately affecting the slow execution of the camera process. > > By using this optimization modification, multiple exiting processes can > quickly exit, freeing up their CPU resources and physical memory of > pte_preset, improving the running speed of camera processes. > > Testing Platform: 8GB RAM > Testing procedure: > After restarting the machine, start 15 app processes first, and then > start the camera app processes, we monitor the cold start and preview > time datas of the camera app processes. > > Test datas of camera processes cold start time (unit: millisecond): > | seq | 1 | 2 | 3 | 4 | 5 | 6 | average | > | before | 1498 | 1476 | 1741 | 1337 | 1367 | 1655 | 1512 | > | after | 1396 | 1107 | 1136 | 1178 | 1071 | 1339 | 1204 | > > Test datas of camera processes preview time (unit: millisecond): > | seq | 1 | 2 | 3 | 4 | 5 | 6 | average | > | before | 267 | 402 | 504 | 513 | 161 | 265 | 352 | > | after | 188 | 223 | 301 | 203 | 162 | 154 | 205 | > > Base on the average of the six sets of test datas above, we can see that > the benefit datas of the modified patch: > 1. The cold start time of camera app processes has reduced by about 20%. > 2. The preview time of camera app processes has reduced by about 42%. This sounds quite promising. I understand that asynchronous releasing of swap entries can help killed processes free memory more quickly, allowing your camera app to access it faster. However, I=E2=80=99m unsure about the impact of swap-related lock contention. My intuition is that it might not be significant, given that the cluster uses a single lock and its relatively small size helps distribute the swap locks. Anyway, I=E2=80=99m very interested in your patchset and can certainly appreciate its benefits from my own experience working on phones. I=E2=80= =99m quite busy with other issues at the moment, but I hope to provide you with detailed comments in about two weeks. > > > >>>> Additionally, the relatively lengthy path for releasing swap entries > >>>> further contributes to the longer time required for the background p= rocess > >>>> to release its swap entries. > >>>> > >>>> In the multiple background applications scenario, when launching a l= arge > >>>> memory application such as a camera, system may enter a low memory s= tate, > >>>> which will triggers the killing of multiple background processes at = the > >>>> same time. Due to multiple exiting processes occupying multiple CPUs= for > >>>> concurrent execution, the current foreground application's CPU resou= rces > >>>> are tight and may cause issues such as lagging. > >>>> > >>>> To solve this problem, we have introduced the multiple exiting proce= ss > >>>> asynchronous swap memory release mechanism, which isolates and cache= s > >>>> swap entries occupied by multiple exit processes, and hands them ove= r > >>>> to an asynchronous kworker to complete the release. This allows the > >>>> exiting processes to complete quickly and release CPU resources. We = have > >>>> validated this modification on the products and achieved the expecte= d > >>>> benefits. > >>> Dumb question: why can't this be done in userspace? The exiting > >>> process does fork/exit and lets the child do all this asynchronous fr= eeing? > >> The logic optimization for kernel releasing swap entries cannot be > >> implemented in userspace. The multiple exiting processes here own > >> their independent mm, rather than parent and child processes share the > >> same mm. Therefore, when the kernel executes multiple exiting process > >> simultaneously, they will definitely occupy multiple CPU core resource= s > >> to complete it. > >>>> It offers several benefits: > >>>> 1. Alleviate the high system cpu load caused by multiple exiting > >>>> processes running simultaneously. > >>>> 2. Reduce lock competition in swap entry free path by an asynchronou= s > >>>> kworker instead of multiple exiting processes parallel executio= n. > >>> Why is lock contention reduced? The same amount of work needs to be > >>> done. > >> When multiple CPU cores run to release the different swap entries belo= ng > >> to different exiting processes simultaneously, cluster lock or swapinf= o > >> lock may encounter lock contention issues, and while an asynchronous > >> kworker that only occupies one CPU core is used to complete this work, > >> it can reduce the probability of lock contention and free up the > >> remaining CPU core resources for other non-exiting processes to use. > >>>> 3. Release memory occupied by exiting processes more efficiently. > >>> Probably it's slightly less efficient. > >> We observed that using an asynchronous kworker can result in more free > >> memory earlier. When multiple processes exit simultaneously, due to CP= U > >> core resources competition, these exiting processes remain in a > >> runnable state for a long time and cannot release their occupied memor= y > >> resources timely. > >>> There are potential problems with this approach of passing work to a > >>> kernel thread: > >>> > >>> - The process will exit while its resources are still allocated. But > >>> its parent process assumes those resources are now all freed and = the > >>> parent process then proceeds to allocate resources. This results= in > >>> a time period where peak resource consumption is higher than it w= as > >>> before such a change. > >> - I don't think this modification will cause such a problem. Perhaps I > >> haven't fully understood your meaning yet. Can you give me a speci= fic > >> example? > > Normally, after completing zap_pte_range, your swap slots are returned = to > > the swap file, except for a few slot caches. However, with the asynchro= nous > > approach, it means that even after your process has completely exited, > > some swap slots might still not be released to the system. This could > > potentially starve other processes waiting for swap slots to perform > > swap-outs. I assume this isn't a critical issue for you because, in the > > case of killing processes, freeing up memory is more important than > > releasing swap entries? > I did not encounter issues caused by the slow release of swap entries > by asynchronous kworker during our testing. Normally, asynchronous > kworker can also release cached swap entries in a short period of time. > Of course, if the system allows, it is necessary to increase the running > priority of the asynchronous kworker appropriately in order to release > swap entries faster, which is also beneficial for the system. > > The swap-out datas for swap entries is also compressed and stored in the > zram memory space, so it is relatively important to release the zram > memory space corresponding to swap entries as soon as possible. > > > >>> - If all CPUs are running in userspace with realtime policy > >>> (SCHED_FIFO, for example) then the kworker thread will not run, > >>> indefinitely. > >> - In my clumsy understanding, the execution priority of kernel threads > >> should not be lower than that of the exiting process, and the > >> asynchronous kworker execution should only be triggered when the > >> process exits. The exiting process should not be set to SCHED_LFO, > >> so when the exiting process is executed, the asynchronous kworker > >> should also have opportunity to get timely execution. > >>> - Work which should have been accounted to the exiting process will > >>> instead go unaccounted. > >> - You are right, the statistics of process exit time may no longer be > >> complete. > >>> So please fully address all these potential issues. > >> Thanks > >> Zhiguo Thanks Barry