From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A02F2C47258 for ; Thu, 1 Feb 2024 00:43:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C50576B007E; Wed, 31 Jan 2024 19:43:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C00C96B0080; Wed, 31 Jan 2024 19:43:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC8016B0081; Wed, 31 Jan 2024 19:43:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9DBF46B007E for ; Wed, 31 Jan 2024 19:43:27 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 08530A08F8 for ; Thu, 1 Feb 2024 00:43:27 +0000 (UTC) X-FDA: 81741386454.04.9B1F097 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf23.hostedemail.com (Postfix) with ESMTP id 775D9140015 for ; Thu, 1 Feb 2024 00:43:23 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="acZ/v7pp"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf23.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706748204; a=rsa-sha256; cv=none; b=kT1a0Wnp46mdvB86e1GmwsFL2ZsIFjpUWKyKz54yapdeh5OgjL/+TnJ4oBLK5N1FzmxvZB J9lAgBboybRHxgsplS0H4jh9YWrBO4Yfhg2dVqa5TFLxD5+HOLQV75PrWhEGpCvbNR+vYI T/p/ynYd9WPPgmVP5CbVp8nVP9RB/sM= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="acZ/v7pp"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf23.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706748204; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9yXpOJvApZ9C1Qcc2eHFxJmC2HYPOkfttx+788ZQKws=; b=IxVz6OM9BkvLJjdKtiNOKrJ10UZ/sG8alTIOS3Fzlm8weiwNB6bxaKV0H5InYAaGYc+TbB TWyY0me5SqJVZkksXW3H4c+KQMFTGKr7nzty7Gqk2nRk9deFMpzIvfIu7v86YKVc6eDIEO 07WWrrbLgWlCW4A251CxyXj0Qd8YBjA= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id E09F3CE2437 for ; Thu, 1 Feb 2024 00:43:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 497C3C43609 for ; Thu, 1 Feb 2024 00:43:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706748198; bh=SAKZoIcbXQBhVU/XPjInF6+T+qpKARwlN1qzK6xE+uA=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=acZ/v7ppW2j+V2QYbN4IPPvvf9H6a7QoZRMjBpSkVeizW6EmPz4EDjVdSC3i33T1S NRHKX6yW045sRFKDq3mBlnlMdViMgVZfg7BfLP+vJMnUIlEyc43aLyo2uPLBuvp+En 8waF7O4Mg/zESlH5//p+C8HmnS6bziHpjfcVGO/pYUw4oDKY1Q+hEqWehH7lWaAy/+ sV5I+MFRxzkpGfzfPWVUxfORStQ+f3ujdMnPDxuXs7utcJJ7A/E+7rX6v0hid4VgsC f85qTZBbt1viNIiXoL4p5maOCFA06/7bbcCW+DQnm8lRoviwALl9uv2Ge4jz96TXbV BTIQCWL2T3/AA== Received: by mail-il1-f171.google.com with SMTP id e9e14a558f8ab-36396ae2361so1800955ab.1 for ; Wed, 31 Jan 2024 16:43:18 -0800 (PST) X-Gm-Message-State: AOJu0YwmCM0rJNVWxabRmo18joln73EZP4RHMvKbjuaTlDiAtZzU/1+P 1E4ZTmGOprNsckPUVTBQ9TEYGDxtXYbhdDT0zDTxjvDiPRcVNS2YiJYMfnMQppf7YYbOvQR7MI3 NXw0reCTfoc2yQrM6QKjnxLS67rAuwdkPZsUH X-Google-Smtp-Source: AGHT+IGB7oLJrmaP/aXwop9ABqv4dDhMbyhMzCapn/Tj3EXZBP4RineWW6y0EDYIXROFb6PYeUXWZWt7R/5EYthupYg= X-Received: by 2002:a92:c74f:0:b0:363:8594:350 with SMTP id y15-20020a92c74f000000b0036385940350mr3537392ilp.1.1706748197388; Wed, 31 Jan 2024 16:43:17 -0800 (PST) MIME-Version: 1.0 References: <20231221-async-free-v1-1-94b277992cb0@kernel.org> <20231222115208.ab4d2aeacdafa4158b14e532@linux-foundation.org> <87o7eeg3ow.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87o7eeg3ow.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Chris Li Date: Wed, 31 Jan 2024 16:43:05 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] mm: swap: async free swap slot cache entries To: "Huang, Ying" Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Wei Xu , Yu Zhao , Greg Thelen , Chun-Tse Shao , Suren Baghdasaryan , Yosry Ahmed , Brain Geffon , Minchan Kim , Michal Hocko , Mel Gorman , Nhat Pham , Johannes Weiner , Kairui Song , Zhongkun He , Kemeng Shi , Barry Song , Hugh Dickins , Tim Chen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 775D9140015 X-Stat-Signature: ftg99wyhh1xuwuw7teakk7ccdoj7ztbs X-HE-Tag: 1706748203-642284 X-HE-Meta: U2FsdGVkX19xDxQe5f96jJPnnlZb5btgHMzQnthYwyJxHyiYMCGQsFpdZRUfj5ganfEOrjo1ebqu/BBhVtxh2BEH+zDIJq/7LECE+QcQairon9W/m8POB33u/DPq3YRqMSZtHzw/yXPNmk9QVARhje52txYdRcQQbca3if3NwMJHlfCn7T3QuPiBnoDsobfYhzFqa/JlSqT8IDqIu8gRSnlwq5u8ZJ5TzppTRp8A3Y+0Zt/4pTjo1ss3kpEC6RzkOXdXtlDF3GlqxQSegJk7OI+neXIeID/7dwlaf5J2cMb4nVCDSUUsfLNkU7efLWEWC8Wk2NRM8GyaCVbqwx4RQz9h2+DGgRgxz9jGzkoqJPnRDj4niLFyf+eMTb93k+7yywv5mW01wNb0cg8ji3f//yqyWus3brrhc4XHgRoCvQVzYlma2JLXDyF0JxuMWJMD7xVk1wDW32SWxal//TPD/FR55bZ38JEacw4yeWANS67kb+h3RfURinyDhnMWc9maXbbsWeYLCh/ZDkF6yXa4w+DmC82aDNePecHUIICc7cqtYY5Btpx3P3eb3za152pQBiCaZZMwEDchW4fSDZDlfeemgKJktcTWbMiP5dU4sa5guDH5EM/sCHQVRVO5MLa8SseDtcNg3mp9rvTRGdfdtBAFxG+L02gJ7uFl6r96X5C+sz34g6kg4vJ/5DAApDfFIbWf5QdNmXNZr0HIIxUkgowUSz8bD2XKDoWHUak8MFGL5OKpKdeyd5NcHhqUnrWXXmKnRbqZttEgshAlUyRhrpVVMdkO19JgAObMnF79Wvivyw3/sjdk6Pb0vAnSBaCFgMfvYYh4c0qz6GqunIun3SdudJ0lY/ePxiznMUE26oOLOSLgVxZoZ1WPOFprnD9rsqaQHdA8C0lvcsM4TgJ71Axy5S7GK5bdECEI7MTY+qIMbWtNvOLBsOv0x+9j3Zey6hWP+WV+uv/hyJJ6np7 oQEzZEHX K+wsowEPHw6o7RTvUJbLwsF0LlAzleXq4ufXUXT2we5PZPA0z0doyLWuRpgrVtDnj1/dkTMACz31r+qlx/Vu9gKTnIR5ZizKPMplxP70Nyz/4e/4RTfkcGWIxRI98YPbRs/NK+Gn6jU8UFC3daV4Oyroq/k1OGKsPXvkZ6ARmnbvtHucISHJ+v26WmIRc7fRssz4gq/ov1u4nQfQ+OoyMgl0DQg/rwYK6u5UWxfsK8OV8Nvq+5TpFwKeNY8tu3bo7SX7kZZ5VPmtiy74mebvz6juqsQe07jp9QNR2tR7/Us5kl8Zo7zXl/50pjeUWL1K0PcK23dkxaJLRV4c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Ying, Sorry for the late reply. On Sun, Dec 24, 2023 at 11:10=E2=80=AFPM Huang, Ying = wrote: > > Chris Li writes: > > > On Fri, Dec 22, 2023 at 11:52:08AM -0800, Andrew Morton wrote: > >> On Thu, 21 Dec 2023 22:25:39 -0800 Chris Li wrote: > >> > >> > We discovered that 1% swap page fault is 100us+ while 50% of > >> > the swap fault is under 20us. > >> > > >> > Further investigation show that a large portion of the time > >> > spent in the free_swap_slots() function for the long tail case. > >> > > >> > The percpu cache of swap slots is freed in a batch of 64 entries > >> > inside free_swap_slots(). These cache entries are accumulated > >> > from previous page faults, which may not be related to the current > >> > process. > >> > > >> > Doing the batch free in the page fault handler causes longer > >> > tail latencies and penalizes the current process. > >> > > >> > Move free_swap_slots() outside of the swapin page fault handler into= an > >> > async work queue to avoid such long tail latencies. > >> > >> This will require a larger amount of total work than the current > > > > Yes, there will be a tiny little bit of extra overhead to schedule the = job > > on to the other work queue. > > > >> scheme. So we're trading that off against better latency. > >> > >> Why is this a good tradeoff? > > > > That is a very good question. Both Hugh and Wei had asked me similar qu= estions > > before. +Hugh. > > > > The TL;DR is that it makes the swap more palleralizedable. > > > > Because morden computers typically have more than one CPU and the CPU u= tilization > > is rarely reached to 100%. We are actually not trading the latency for = some one > > run slower. Most of the time the real impact is that the current swapin= page fault > > can return quicker so more work can submit to the kernel sooner, at the= same time > > the other idle CPU can pick up the non latency critical work of freeing= of the > > swap slot cache entries. The net effect is that we speed things up and = increase > > the overall system utilization rather than slow things down. > > You solution depends on there is enough idle time in the system. This > isn't always true. > > In general, all async solutions have 2 possible issues. > > a) Unrelated applications may be punished. Because they may wait for > CPU which is running the async operations. In the original solution, > the application swap more will be punished. The typical time to perform on the async free is very brief, at about 100ms level. So the amount of punishment would be small. The original behavior was already delaying the freeing of swap slots due to batching. Adding a tiny bit of time does not change the overall behavior too much. Another thing is that, if the async free is pending, it will go through the direct free path. > b) The CPU time cannot be charged to appropriate applications. The > original behavior isn't perfect too. But it's better than async worker. Yes, the original behavior will free other cgroups' swap entries. > Given the runtime of worker is at 100us level, these issues may be not > severe. But I think that you may need to explain them at least. Thanks for the suggestion. Will do in V2. > > And, when swap slots freeing batching was introduced, it was mainly used > to reduce the lock contention of sis->lock (via swap_info_get_cont()). > So, we may move some operations (e.g., mem_cgroup_uncharge_swap, > clear_shadow_from_swap_cache(), etc.) out of batched operation (before > calling free_swap_slot()) to reduce the latency impact. That is good to know. Thanks for the explanation. Chris > > > The test result of chromebook and Google production server should be ab= le to show > > that it is beneficial to both laptop and server workloads, making them = more responsive > > in swap related workload. > > -- > Best Regards, > Huang, Ying >