From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A866CC4828C for ; Thu, 1 Feb 2024 01:22:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B2BB6B0081; Wed, 31 Jan 2024 20:22:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 38A256B0082; Wed, 31 Jan 2024 20:22:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 278296B0083; Wed, 31 Jan 2024 20:22:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1A1846B0081 for ; Wed, 31 Jan 2024 20:22:34 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B10A5C061F for ; Thu, 1 Feb 2024 01:22:33 +0000 (UTC) X-FDA: 81741484986.29.B62BEB2 Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by imf12.hostedemail.com (Postfix) with ESMTP id DB93D4000C for ; Thu, 1 Feb 2024 01:22:31 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=xAX04MjB; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706750552; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=R1i2UN1ZAwaudGKHw526hiiniRq9FbmCBbcGNJI+D8E=; b=ct7kO6EVpZ++k8TYg0Y7UNN1Wnf8LGQ7HwCnbTQ0QGZiPuZZ8a2aKt4ZVJLNEZ6ABzpsUA 4t5ehxHncdHa0eNLBrgFP3VNFBunwD/FBcF31z28bH1VHHhGFF7C8ttrKOz5sDV8LZxuB2 SfChYnr8gOZ0iyngCvZyokRLjf40v9c= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=xAX04MjB; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706750552; a=rsa-sha256; cv=none; b=g1sTPa+iLMMP45jytKlwbKcnI9L/3MUmuXxoAYU4Lzq4Szxs02Aw3UhogaCg//69o32rKj Ow/dgHVm6I64PPR008mi5j2zjcXro+eQ3vuOFjg5UtD8IbUQ9e3blRrLSR6lE7Q/Xa0Mck kDZoEgsjBk8SgpOcq32/KYFLGllaAx4= Received: by mail-lf1-f53.google.com with SMTP id 2adb3069b0e04-5111e5e4e2bso554967e87.3 for ; Wed, 31 Jan 2024 17:22:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706750550; x=1707355350; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=R1i2UN1ZAwaudGKHw526hiiniRq9FbmCBbcGNJI+D8E=; b=xAX04MjB/2o4OgkTBB2w8ySv71L3pwSs4JU1yY2+oQ31gB8TxP3HIKqHUn0daF4q51 4pbHX9qBOWVs15UyllKeVP8ZCcrJQQMwXZJEuyq7lEWB/P6wxGkyz9D+D08wLxpE7Adx 9xMuQel947ak9KEEtdRE4A/Eq3JT1eQ+8SgevfpdXilL3EbzPokc4YzlpAOoaFPopoBm ix0pDk660r4ZSbWEJ2UXmjOyF8aK/WR10ibpGZ7UE21J25HFnkk/JCRllirJXIpB2SrW fkql6UtUqGC9E88p+PbRax2qEHt3GGHo5AmnNInOj6T77K5Yg/Xn3Z4TGoWganSBogLg mAwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706750550; x=1707355350; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=R1i2UN1ZAwaudGKHw526hiiniRq9FbmCBbcGNJI+D8E=; b=ep9WETskoGHQ78mhctVtxDHeLQDBaK6ZPegi6RuiS7Q7RXe7sUi/VF9eyWsZkygmhC /oMnDPC8D9Z3J/MH0p5DUrpQbiibG+5xxuq9SXNxmiKWrfdYGGdFhMjFFwEKIkenWCYV AGn4bPfTt9dcFnRknNf1JGT6UMr3wTOiMUe5GaSDQhyqfVja+U1ZiEZf4u7sqhrX4pwF tR4tf0m+gGgFC1MiMCRu0hDcHIF/M6pQbqYM0P0vx/zWZQF8racPp7DrKFeO0Z2ZPv/Z wjsqMegRjOHmCb2wSXtvbizLGc5JqcgGhUIcv1fKCn/2Mo2+0ulDaXuWjbODndTR1F9v +L4A== X-Gm-Message-State: AOJu0YzkFcWQCOVXjciVJqSnnWJrxUcz7Sd+0eczMnMSvO2R0WMBGXIb byTb1mNs5y5+Re8zd3B6vSvO9IEWleUy5eRR26mIIRv3QuiEuCPzZPJexMuEZ6qYayL047Gkvjn +FRcuLgiUELwSx1nCWTswvF5JuyU9LwsZlcnI X-Google-Smtp-Source: AGHT+IH9V/rMJKDrUr81CPdq7YwY4UmiRjjoIicx5JWHtVQaKpzF9lrDkfQ+tDXAGHij8ax9K42I1YO4yp/spgipZxw= X-Received: by 2002:ac2:551c:0:b0:510:2582:5591 with SMTP id j28-20020ac2551c000000b0051025825591mr482052lfk.25.1706750549821; Wed, 31 Jan 2024 17:22:29 -0800 (PST) MIME-Version: 1.0 References: <20231221-async-free-v1-1-94b277992cb0@kernel.org> In-Reply-To: From: Yosry Ahmed Date: Wed, 31 Jan 2024 17:21:52 -0800 Message-ID: Subject: Re: [PATCH] mm: swap: async free swap slot cache entries To: Chris Li Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, =?UTF-8?B?V2VpIFh177+8?= , =?UTF-8?B?WXUgWmhhb++/vA==?= , Greg Thelen , Chun-Tse Shao , =?UTF-8?Q?Suren_Baghdasaryan=EF=BF=BC?= , Brain Geffon , Minchan Kim , Michal Hocko , Mel Gorman , Huang Ying , Nhat Pham , Johannes Weiner , Kairui Song , Zhongkun He , Kemeng Shi , Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: DB93D4000C X-Rspam-User: X-Stat-Signature: ighnkte3cjo4cxwoywemxw5zcubs3de5 X-Rspamd-Server: rspam01 X-HE-Tag: 1706750551-59653 X-HE-Meta: U2FsdGVkX1+kBt8fhDYje3nISrb0kNdN2goDJwDkphuY8y83UxxzHgxsA6a04upFHbjvbu1HIgErMZegBlPumIibFfMPC0n6WMa5yeLcLKzIAV31FzgeGmqYk3HUcnV4WSb27oFjqEfpQKZBIwuMbEIM6o7MnsQE3npBv5TsEfedmFmm81dM7Jle2+hprocAh02XsL3TG4y0sVCz6XDo6Dg5GFn0BWWTGm7onIu9+k1YkzF34YIxGVt12P4XyGq7wbLpV2+MsRwjKFWH1d3kz0dp5iIzY7086zjJqxYDzUpRGMxqACdldFkTR+BMcgAnndCG8ZZPmF24YJv/4K2li44fXmUuVrv69NM5hqrlRNffAKUHJvoTH5EfCX4aeOJ8s9e6/R+IBwOrlBMhfrscZfeqtUA68n99yxeo0JIfegwavA5x5mRODHZmMLjBHdGYZ9/9eb0sPbuSOH0WDwKvXhTJ4sK7J5rDDUJtSlFYcBXDcsHSH7BUk+H1vu4Z5opOydYi0BngXTyEjJvOik1B35apdfdRwLBo3PPisPAtd3aBfoX5xsTY6Y9629v+ZusELEVhJXVR3KoK6A3XwdjDl4gO6zCQP/y+fAJnvGac1L3YpXvz1jVqJ1V0ra22tf4kKnsiHUfYpCbcQBOVGsydZ8V+zKHzgXarW3WIAJ+8dobxjdiCfGORahwWRM/g+MXh+nMsNuP2XRSGlvBRdBEnhTPNd7+hDRP9BgOK44d1HIBJsqGpwL34Elqr7auZNko9l1ZK5bEKi11WZ+/LlWP2zJ0jT9GlIIZzgfAoh/3G6Ry+BocQcD4BDqh5rte47r+RvzUGC/8ZHSvWdWqnWX4Wg8SxTrn4bbXq0iSU1Xc3rk02IOScsT9w+I87VykQJkjPRdM9W6kVf5ih/4f6X31MnS1o6MmYJNoFcNGnZjQe7AXhi8yZLWkR8HKJ//rfRzagBWGxOocsrXE8i+L0J6t bk364CqF 7c0nTZU9uN9z/JKjK2UgSqAYCjeOCE/xjaLYeZACi4gNb004C26CxudDZIWeSpVhohdQHqOMCp6u7b9e4h1gmIK8jPd8sGK9bv0OTFpuMfMei82ETRLJvdPzSvjNFzauKeNE5A2W1Tva8Yg2w5bcSBHh3Tq06jjSUO9p0ce8smuY1/T3ZEf2eR6TJITAcFUqylWBTNf8QByd7P3zwEBaoPLqn5/DFYBGGpqSNmGUpiZ6b31rVXqkXXX1CIqtWuZAc4C5oaw1L+CtPb1tp5oktl5sHW//kOyxv7e6frqulu0SbS0cALArFAqYLNg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 31, 2024 at 4:57=E2=80=AFPM Chris Li wrote: > > Hi Yosry, > > On Thu, Dec 28, 2023 at 7:34=E2=80=AFAM Yosry Ahmed wrote: > > > > On Thu, Dec 21, 2023 at 10:25=E2=80=AFPM Chris Li w= rote: > > > > > > We discovered that 1% swap page fault is 100us+ while 50% of > > > the swap fault is under 20us. > > > > > > Further investigation show that a large portion of the time > > > spent in the free_swap_slots() function for the long tail case. > > > > > > The percpu cache of swap slots is freed in a batch of 64 entries > > > inside free_swap_slots(). These cache entries are accumulated > > > from previous page faults, which may not be related to the current > > > process. > > > > > > Doing the batch free in the page fault handler causes longer > > > tail latencies and penalizes the current process. > > > > > > Move free_swap_slots() outside of the swapin page fault handler into = an > > > async work queue to avoid such long tail latencies. > > > > > > Testing: > > > > > > Chun-Tse did some benchmark in chromebook, showing that > > > zram_wait_metrics improve about 15% with 80% and 95% confidence. > > > > > > I recently ran some experiments on about 1000 Google production > > > machines. It shows swapin latency drops in the long tail > > > 100us - 500us bucket dramatically. > > > > > > platform (100-500us) (0-100us) > > > A 1.12% -> 0.36% 98.47% -> 99.22% > > > B 0.65% -> 0.15% 98.96% -> 99.46% > > > C 0.61% -> 0.23% 98.96% -> 99.38% > > > > I recall you mentioning that mem_cgroup_uncharge_swap() is the most > > expensive part of the batched freeing. If that's the case, I am > > curious what happens if we move that call outside of the batching > > (i.e. once the swap entry is no longer used and will be returned to > > the cache). This should amortize the cost of memcg uncharging and > > reduce the tail fault latency without extra work. Arguably, it could > > increase the average fault latency, but not necessarily in a > > significant way. > > > > Ying pointed out something similar if I understand correctly (and > > other operations that can be moved). > > If the goal is to let the swap fault return as soon as possible. Then > the current approach is better. > mem_cgroup_uncarge_swap() is only part of it. Not close to all of it. I think there are a lot of operations that we can move out of swapcache_free_entries(): - mem_cgroup_uncharge_swap() - arch_swap_invalidate_page() - zswap_invalidate() - clear_shadow_from_swap_cache() , and maybe others. I am curious, if we move these operations from the batched freeing, would this remove the increased tail latency and make it more consistent, without doing extra work? I believe this is what Ying was also asking about. > > > > > Also, if we choose to follow this route, I think there we should flush > > the async worker in drain_slots_cache_cpu(), right? > Not sure I understand this part. The drain_slots_cache_cpu(), will > free the entries already. The current lock around cache->free_lock > should protect async workers accessing the entries. What do you mean > by flushing? Never mind. I just realized that the percpu caches are static, so they are not freed in drain_slots_cache_cpu(). The NULL check in the async worker should be enough protection.