From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD65EC021AA for ; Wed, 19 Feb 2025 08:35:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5DABF6B012D; Wed, 19 Feb 2025 03:35:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 58AFE6B012E; Wed, 19 Feb 2025 03:35:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42B18280205; Wed, 19 Feb 2025 03:35:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 23F2A6B012D for ; Wed, 19 Feb 2025 03:35:06 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8DED1B2293 for ; Wed, 19 Feb 2025 08:35:05 +0000 (UTC) X-FDA: 83136034170.02.E3AEBE0 Received: from mail-lf1-f41.google.com (mail-lf1-f41.google.com [209.85.167.41]) by imf27.hostedemail.com (Postfix) with ESMTP id A189F40007 for ; Wed, 19 Feb 2025 08:35:03 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mFXU5qdy; spf=pass (imf27.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.167.41 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739954103; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ymf/r5WVS2oRidQjYGCcOUX71UniPr2O+cwqDyy+uhM=; b=Uy7psvRrBC+e5bwbPfq7qJIDCQgWSAzz1RltbLRhccnsLxOqqDL+R3QTUN2s5eSYidffdZ bjmmpm8zRwfSsnGFoMKLF9aSACHDCOdMjUAsy1M0O7hRiEgf9hsb0iyTH9ERmtJWBVZPrq 1nR2hX1wSjLCGtKcbHcQ511fg11rwoo= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mFXU5qdy; spf=pass (imf27.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.167.41 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739954103; a=rsa-sha256; cv=none; b=X3Z7VCwAHMPkwShMUIs5iJdG4JFHNeNToL7rFs681YgY1OkbrbSCBS1OfYaIAIPnj162ow gW6nFdLpLHwJeAOnI+ZsmpBr/ZDl54OrwZXfpS4ykYZlsk55ZtPSgG9rjhN3SF0BjFcv5n 9GeCN0US1Gyov8nAHKx3Dyv/hI5BRy0= Received: by mail-lf1-f41.google.com with SMTP id 2adb3069b0e04-545284eac3bso5501276e87.0 for ; Wed, 19 Feb 2025 00:35:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739954102; x=1740558902; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ymf/r5WVS2oRidQjYGCcOUX71UniPr2O+cwqDyy+uhM=; b=mFXU5qdy8z918Gi+Y/w+5yRFhSEmAkJp+4zJE2ifnTc+QtmOYx1yTJUeqBhTJiYHmg 1LSQwoMBVPO3xwsawYA8wltUxjQAKMH+/WPjVd1VemfWsf4ngX5eNJ/IGOTjVIekunYO r7KEZNOEUZFpM1REBxmDKIG2qFnwtC25QcibhCX3xftSvXPmRDWD3zOY7rUbEBf/PzxW fZbsfmeeJc7Hzditk0bWXZG037h52mn38TGp4XrgZgg5K3F2x9dNdCdBOhqGDpJBl9HU pki6TDBsCcgVTBr2oF1uBuBQOZYp7jWGteP5COPcd31cCRH74zzYUDxeE2WBJkXSytJJ cGaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739954102; x=1740558902; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ymf/r5WVS2oRidQjYGCcOUX71UniPr2O+cwqDyy+uhM=; b=ZJHa7YX3Z2muSzajZcR3Yyx/MoIasZfkl5dYEmMxwk9UuZ2YZIZh0zBxkEm+mvOd/t lrFOrNUscssWpjDsgndL8KBgmXmbolcALO+6hiclgdiQ1bKRo+cCW+CKPI59A88O6peA NFuqgIylVnE1QnU1TVXzuz5UNktw4bOgw7GfEj4Tzr1kIbBvjK0Yb/v1gvQn++4tCHeU whrL4Gr9YXcxgOzVQ4BlMjunYCPLzm+/TOvqhwXntTTGxw30R4HfbsZQBICG7Zxfo1sr FVWWI/yIwm1i0Zg6aIrr8hLtysJ+SLFWTycldmEUjurST/G1T4NimKAvz01OB/XFz6LV g50Q== X-Gm-Message-State: AOJu0YzSowEXy8t6eL6sId8YdMndJEL9dU+GOY29GnIFd+iDNJg+WiNH 5ywBHY98HJ2lyKvTxQ6TceV9z9epVuA5aQ24814wGbb6N3Y+QSP7OECXAtQRTGJe7/iUfE2KrRV 2g+3UXMNTASDPVBLccssdJy3lUmU= X-Gm-Gg: ASbGnctwF0+9dcjS2AlMkbsV+WtXFaX/lsL2owapSdF38DmGrOHDilqDmz6VN4X/WJA qaDDyd2nDLfFJRSYtDdkHMAVYT1EDJf0gqakzOqNL55hRcDPA+y9aVVbihxe9C/CAiqZXgaem X-Google-Smtp-Source: AGHT+IGv3q5/LIOR6BIzcIw9PSev1r9ulF73uf4A99AqseTCJq9zL6QOHf9tiO80xMKCoXG+cgrSDbNIpOZ/oHeBbEU= X-Received: by 2002:a05:6512:3d15:b0:545:a2f:22bd with SMTP id 2adb3069b0e04-5462ef23810mr1112660e87.48.1739954101418; Wed, 19 Feb 2025 00:35:01 -0800 (PST) MIME-Version: 1.0 References: <20250214175709.76029-1-ryncsn@gmail.com> <20250214175709.76029-6-ryncsn@gmail.com> In-Reply-To: From: Kairui Song Date: Wed, 19 Feb 2025 16:34:45 +0800 X-Gm-Features: AWEUYZkPJOd4SwyQaKWas7yryXBRZnRYFVtmL1SihZNVdbYFgqLG0htrQyLOR4o Message-ID: Subject: Re: [PATCH 5/7] mm, swap: use percpu cluster as allocation fast path To: Baoquan He Cc: linux-mm@kvack.org, Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: e5p6g16wjc1j4yfywem37sh6sexe331j X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A189F40007 X-HE-Tag: 1739954103-198876 X-HE-Meta: U2FsdGVkX18jkPOQbr8AVppRQp77Edk9saFxrHK6DYxxQqOGGH8yZu3lXjdUtT6m4CBiaVGluqnSZ/b8L2RSJV7cecK3jlRB+/bswMlw3nnF/jjk7r9YqIPR6ovtIBCZ6HaO6UXFxIZMiIG99JUuTUec/eeE19yB5Qvfl84tIIP8iKOxC2sfez7rvtP49JLgeLXxuMW/V9dxibU8zZEOrlzePLBpSCDchpo9fItLkjo7YBwbfMPxqPGssywHZWFNYDpPIVxJVGfLJiidihgsWeelpUYJztixHucjVdNuY5DiFwz+jE6m2qZLz1KV2N+3yC5vIdd4APkXS9vM3+CVLbKQJOtB0JGiXxTbItJ8yKIBhmF4793rxpl+eZn1tr8VktzzVHBa4HTOtFleTUJmqX5zcnlqfj7EKfDR4HkVodscIAO2p6nwivADBr++k1+CUZkw0LGz63RtUHcMgz8rbcuy9rFLttQ/lYNawUUjD8oARwX93DZR/JsT/GX7DhqPt9Ds4Q0fxH80yvut+K0xkbZwlxik/oiW/tzXE5AWfCCsdVx8kY2g5C+bTpCSf/B/Hhs4Omubv+4ADgaR3Kdi3NOn31GAgDBadoVRpwW8lGoQfwdW1XMt9Nv0iCzhtP1Stg4sYbbRVP2k0AN87CahxkDoZoYXma93B4Me19qu8tWOrsK0Vs3IUGAxZC7N4tlCDnynSZsmplx0WVX4gcIhfxDqmkkicYQ98l1CK/Fsqw3IFPepbfRNbTBsv8Xl0XqYvSYWLlvpwE6wevz6KfXkDyRhN0sVQInHomMbtWrgS3xk9XrA0Uvtg/jhCuLLvyCu3JqFCIh2VXEVJZGltADdaT9hc1h8hyus1U4qG3PC0Bn1BKYgNwySjpBCD+9/jhZdgXS7cGBAbgIZNkDZSjyN5nLXmA5konxc9Dynfx9Hlc2o90Wxk3vmuPIZYm/opDmavxYMEGnc4rfxcXHaBN4 17XnNMbm z+QvW+tBMhIe+K53lo0YqHVqlfQvQn3mWlzkXH2DHSU9rVI+MgIYMb0MVNLx9aXQz9/jhfeh99CqSkXyyPYScE+aOLwKchVEn4mFPLzn7hX61b7T+jRAH6d0PZUILcVPeCxoMPipYYf7URrT41CU8iA3vaEQ67MSXxslu9nW9UBkwkxu6AYRVrDYTF+EVbXTpFy5umonnOPS4bbEiXtxeJO1dyqMh7Qe4ZUFsMlk6cXeNCT7vnF0LZe7FHSpgc3JdNWxps2rYYKMqPvgrLkNZhsiF06jNRfsnQO3kRbLoFDL47so4E2XDK09tZlhlUaWbnLtfYGrEa0TZpEPthrPZ6YKLn8xJ3k0Gha1UMj/GaEAqFgQvCnkYSVE5Wg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 19, 2025 at 3:54=E2=80=AFPM Baoquan He wrote: Hi Baoquan, Thanks for the review! > > On 02/15/25 at 01:57am, Kairui Song wrote: > > From: Kairui Song > > > > Current allocation workflow first traverses the plist with a global loc= k > > held, after choosing a device, it uses the percpu cluster on that swap > > device. This commit moves the percpu cluster variable out of being tied > > to individual swap devices, making it a global percpu variable, and wil= l > > be used directly for allocation as a fast path. > > > > The global percpu cluster variable will never point to a HDD device, an= d > > allocation on HDD devices is still globally serialized. > > > > This improves the allocator performance and prepares for removal of the > > slot cache in later commits. There shouldn't be much observable behavio= r > > change, except one thing: this changes how swap device allocation > > rotation works. > > > > Currently, each allocation will rotate the plist, and because of the > > existence of slot cache (64 entries), swap devices of the same priority > > are rotated for every 64 entries consumed. And, high order allocations > > are different, they will bypass the slot cache, and so swap device is > > rotated for every 16K, 32K, or up to 2M allocation. > > > > The rotation rule was never clearly defined or documented, it was chang= ed > > several times without mentioning too. > > > > After this commit, once slot cache is gone in later commits, swap devic= e > > rotation will happen for every consumed cluster. Ideally non-HDD device= s > > will be rotated if 2M space has been consumed for each order, this seem= s > > This breaks the rule where the high priority swap device is always taken > to allocate as long as there's free space in the device. After this patch= , > it will try the percpu cluster firstly which is lower priority even thoug= h > the higher priority device has free space. However, this only happens whe= n > the higher priority device is exhausted, not a generic case. If this is > expected, it may need be mentioned in log or doc somewhere at least. Hmm, actually this rule was already broken if you are very strict about it. The current percpu slot cache does a pre-allocation, so the high priority device will be removed from the plist while some CPU's slot cache holding usable entries. If the high priority device is exhausted, some CPU's percpu cluster will point to a low priority device indeed, and keep using it until the percpu cluster is drained. I think this should be OK. The high priority device is already full, so the amount of swapouts falls back to low priority device is only a performance issue, I think it's a tiny change for a rare case. > > > reasonable. HDD devices is rotated for every allocation regardless of t= he > > allocation order, which should be OK and trivial. > > > > Signed-off-by: Kairui Song > > --- > > include/linux/swap.h | 11 ++-- > > mm/swapfile.c | 120 +++++++++++++++++++++++++++---------------- > > 2 files changed, 79 insertions(+), 52 deletions(-) > ...... > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index ae3bd0a862fc..791cd7ed5bdf 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -116,6 +116,18 @@ static atomic_t proc_poll_event =3D ATOMIC_INIT(0)= ; > > > ......snip.... > > int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_or= der) > > { > > int order =3D swap_entry_order(entry_order); > > @@ -1211,19 +1251,28 @@ int get_swap_pages(int n_goal, swp_entry_t swp_= entries[], int entry_order) > > int n_ret =3D 0; > > int node; > > > > + /* Fast path using percpu cluster */ > > + local_lock(&percpu_swap_cluster.lock); > > + n_ret =3D swap_alloc_fast(swp_entries, > > + SWAP_HAS_CACHE, > > + order, n_goal); > > + if (n_ret =3D=3D n_goal) > > + goto out; > > + > > + n_goal =3D min_t(int, n_goal - n_ret, SWAP_BATCH); > > Here, the behaviour is changed too. In old allocation, partial > allocation will jump out to return. In this patch, you try the percpu > cluster firstly, then call scan_swap_map_slots() to try best and will > jump out even though partial allocation succeed. But the allocation from > scan_swap_map_slots() could happen on different si device, this looks > bizarre. Do you think we need reconsider the design? Right, that's a behavior change, but only temporarily affects slot cache. get_swap_pages will only be called with size > 1 when order =3D=3D 0, and only by slot cache. (Large order allocation always use size =3D=3D 1, other users only uses order =3D=3D 0 && size =3D=3D 1). So I didn't' notice it, as this series is removing slot cache. The partial side effect would be "returned slots will be from different devices" and "slot_cache may get drained faster as get_swap_pages may return less slots when percpu cluster is drained". Might be a performance issue but seems slight and trivial, slot cache can still work. And the next commit will just remove the slot cache, and the problem will be gone. I think I can add a comment about it here?